MobileSheets Forums

Full Version: Issue: Albums Metadata Field as List on PDF Import
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
In working on my 'procedure' for mass migration over from forScore to MobileSheets I have noticed an anomaly in the metadata population upon import of a score into MobileSheets. Let me see if I can describe it clearly.

It concerns the metadata field "Albums."

  * In MobileSheets it is (correctly) a list field (i.e., it can have a list of multiple entries.)

  * I noticed that when I did a test of importing about 100 scores as a batch (that is, I imported an entire subdirectory from my Dropbox account) MobileSheets did two things, both good in my opinion: (1) it did auto-populate the Albums field from the "Books" field in the PDF docs ('Books' being the corresponding forScore metadata field). (2) MobileSheets built for itself an internal reference list of Albums using all the 'Books' entries from each of my imported PDF docs. That internal reference list is now easily available in the dropdown for the Albums field in MobileSheets. Very good.

  * Ok, here's the problem/anomaly: When importing a PDF doc MobileSheets isn't parsing out multiple entries for the 'Books' field in cases where there are multiple entries (separated by a comma). Instead, it is running them all together as a single entry. For example, instead of: [Book Title1] [Book Title2] (two seperate list entries) I get: [Book Title1,Book Title2 ] (a single list entry).

  * Perhaps put simple: The Albums field should work the way that the Genres field works.

I have attached a reference example document that illustrates this anomaly. See: Issue_AlbumsAsList_MetadataReferenceDoc.pdf

I have also attached a screenshot of the metadata edit page in MobileSheets after importing the above reference doc. See: Issue_AlbumsAsList_iPad-screenshot.jpg
I didn't separate values by commas when processing the other fields. I can certainly do that. Hopefully it won't cause any issues with composers where you could have a comma separating the first and last names. Let me know if you think I shouldn't separate by commas when processing any of the fields. I really don't like separate by commas in general because there are so many cases where a comma could be present in the name of something. That's why with CSV files I went with ; as the column delimiter and | as the value delimiter. You are must less likely to encounter conflicts with those.

Thanks,
Mike
(08-07-2023, 03:56 AM)Zubersoft Wrote: [ -> ]I didn't separate values by commas when processing the other fields.
>I didn't separate values by commas when processing the other fields <--Interesting. In looking closely at the 'Genres' field, my PDF docs have these separated by a comma and MobileSheets did (correctly in my opinion) bring each in as a separate item. Further, items that have spaces in them were *not* separated at the spaces (also correctly in my opinion). Only the commas caused a break into a separate item.

In doing a bit of research it seems that Adobe's spec is to consider either a comma or a semicolon as a break indicator for 'list' fields. forScore will NOT recognize a semicolon as a break character tho, which is why I have been using commas.

Also, in forScore only certain metadata fields are considered to be 'list' fields. The only fields in forScore that accept multiple entries are: Genres, Tags (MS=Keywords), and Books (MS=Albums).

For my part, having the composers field be a list field would be handy; tho at present all my scores have only a single composer entry due to forScore's limitation on this field, but some of my entries do have commas in them due to multiple composers. E.g., "Seifert, Johnston and Zanetti"

Having Genres, Keywords and Albums be list fields is sort of necessary for me to bring my forScore docs over without metadata loss.

Otherwise I'm fine with all other fields being single entry only.

But that's me - as to the larger community - this may merit broader input?

Finally, as to the list separator - yeah, that's a challenging one. Comma keeps it compatible with forScore *and* with my ~100 existing scores for which I have already put in the metadata using commas with exiftool. Tho I could probably update my own scores using a script. And as I haven't yet done my big forScore batch export/import yet I could transform the forScore exported commas into semicolons.

(I should note that any commas in titles and composer/arranger work fine in forScore because any non-list field just accepts all characters as no parsing is processed on those - I think per Adobe PDF specs?)

I'm going to take a shot at summarizing the above:

  1. For me I'm fine with either comma or semicolon as a list item separator.

  2. Semicolon likely better fits Adobe's specs. Tho may cause angst for folks making PDF scores that will be used by both forScore and MobileSheets users.

  3. Once MobileSheets has settled on separator, and which fields will be list fields, I will create a written process for mass conversion from forScore to MobileSheets. The process will involve making manipulations to the CSV metadata doc forScore exports, which I will document as clearly as I am able. (For example, in the columns that represent MobileSheets list fields, detecting separator characters and changing them to whatever MobileSheets will use.)

  4. Perhaps you, I and the community at-large will collaborate to decide on #2 above and achieve a good result for #3?
Subject is mapped to Genre, and that's the only field in which I currently separate values by commas.

I think what I will do is separate based on either comma or semi-colon for all fields except Artist and Composer. On those fields, I will only separate based on semi-colon. Does that make sense? The other thing I can do is try to search for a semi-colon, and if one is found, I only separate based on that, otherwise I look for a comma. I would still only look for a semi-colon for Artist and Composer though.

Mike
The idea of using only semicolon for composer and artist makes sense to me.

Using either comma or semicolon for other list fields should work. I can't think of any reason that would cause me any headaches.

Your proposal to look for a semicolon in a list field, and if you see one then to proceed on parsing using only semicolons actually sounds pretty ideal to me. If the coding isn't too complex. One edge case to consider for this proposal: let's say that in some list field I really do only want one item, but that item's text has a comma in it. Seeing no semicolon you would parse it on that comma into two items. What you could allow for this edge case is a semicolon at the end of the single item's text which you would detect and not parse on the comma. This at the risk of making the user manual more complex of course...
(08-07-2023, 08:10 AM)Zubersoft Wrote: [ -> ]I think what I will do is separate based on either comma or semi-colon for all fields except Artist and Composer.
Mike: In performing tests to make sure I can form PDF metadata in my docs that will import well into MobileSheets I just now noticed an oddity. Not in MobileSheets, but the PDFs. 

In my test I replaced the comma separator with a semi-colon for the "Album" field I created in a PDF doc. What I noticed is that when I then query the metadata in my PDF doc using exiftool the output is showing both a semi-colon and a comma. It looks like this: Album: Album 1;, Album 2 

When I import this into MobileSheets I get a single item in the Albums property that looks like: Album 1; Album 2

I can't explain why the comma got added. And maybe it's only really in exiftool's terminal output display from the command I use to show me all the PDF's properties. However, MobileSheets didn't separate the two entries using either the comma or the semi-colon. I'll try to investigate a bit further.

I've attached three docs - the PDF with the offending field in it. A screenshot of the exiftool listing of this PDF's properties. And a screenshot of the MobileSheets properties page after importing that PDF.
(08-09-2023, 05:07 AM)Jeffrocchio@gmail.com Wrote: [ -> ]
(08-07-2023, 08:10 AM)Zubersoft Wrote: [ -> ]I think what I will do is separate based on either comma or semi-colon for all fields except Artist and Composer.
...And maybe it's only really in exiftool's terminal output display from the command I use to show me all the PDF's properties.
I am now convinced it is an exiftool display issue only. When I use a different command to display the metadata I don't get the comma. Screenshot of this attached for completeness.  : )

Of course, MobileSheets still doesn't separate the values on import; I believe I'm waiting on an update per your earlier post in this thread where you said: >I think what I will do is separate based on either comma or semi-colon for all fields except Artist and Composer.<
When there is more than one, do you need double quotes around the entries?

Geoff
(08-09-2023, 07:22 AM)Geoff Bacon Wrote: [ -> ]When there is more than one, do you need double quotes around the entries?

Geoff

Sadly, that doesn't work for PDF metadata fields. The quote characters would just become part of the field's text. The whole thing about Lists in metadata fields is really a mess. In reality it depends on an 'agreement' between the source and target on how they will treat them. There is no hard standard.
I'm finishing up some changes and will be submitting the update today if I can get everything done. So you shouldn't have to wait long for the changes.

Mike
(08-07-2023, 08:10 AM)Zubersoft Wrote: [ -> ]Subject is mapped to Genre, and that's the only field in which I currently separate values by commas.

I think what I will do is separate based on either comma or semi-colon for all fields except Artist and Composer. On those fields, I will only separate based on semi-colon. Does that make sense? The other thing I can do is try to search for a semi-colon, and if one is found, I only separate based on that, otherwise I look for a comma. I would still only look for a semi-colon for Artist and Composer though.

Mike

(08-09-2023, 09:10 AM)Zubersoft Wrote: [ -> ]I'm finishing up some changes and will be submitting the update today if I can get everything done. So you shouldn't have to wait long for the changes.

Mike

Very good, thanks Mike.