• 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Fakebook Indexes for CSV import
#11
@itsme
I've fixed the forum to allow csv files as attachments.

@BRX
If you import from a csv, MS Pro won't know that you already have a song that is the same as the one in the csv. The titles may match, but that by itself is not enough to uniquely identify a song. The existing song will not be updated - a new song will be created. The import feature cannot be used to update existing entries in the library at the moment. If you import from a csv, select some songs, and later import the same csv and select different songs, they will all reuse the same file and recognize that nothing needs to be done as far as copying the file to the storage location.

If people want me to add a feature where csv files can be used to update metadata of existing songs, I can certainly look into doing that.

As far as "double entries", if you mean multiple values in one column, just delimit them with a | character. MobileSheetsPro accepts almost any delimiter now for columns (other than | ). It's automatically detected. Use tabs if you want and it should work. The only requirement is that the | must be used to delimit multiple values in one column.

@Johan
Something definitely sounds wrong there. I'll look into it ASAP.

Thanks,
Mike
Reply
#12
(01-07-2016, 04:46 PM)Zuberman Wrote: If you import from a csv, MS Pro won't know that you already have a song that is the same as the one in the csv. The titles may match, but that by itself is not enough to uniquely identify a song.  The existing song will not be updated - a new song will be created. The import feature cannot be used to update existing entries in the library at the moment. If you import from a csv, select some songs, and later import the same csv and select different songs, they will all reuse the same file and recognize that nothing needs to be done as far as copying the file to the storage location.

Valuably information. Thanks. I assumed so and expect it will be in the updated manual, but better to verify now.
I didn't expect the title alone to be the identifier but at least the title and the PDF source. As I understand if I import a csv again with same title and same pdf source a new song will be created for now, correct?

Quote:If people want me to add a feature where csv files can be used to update metadata of existing songs, I can certainly look into doing that.

You have my vote for an update feature in this case (same title/same source). I think it's too much to have an "intelligent" update feature with comparing fields. It should be sufficient if the old metadata is overwritten with the data of the update cvs.

Quote:As far as "double entries", if you mean multiple values in one column, just delimit them with a | character. MobileSheetsPro accepts almost any delimiter now for columns (other than | ). It's automatically detected. Use tabs if you want and it should work. The only requirement is that the | must be used to delimit multiple values in one column.
No, I was asking how MSP handles duplicate entries. Case in point: I differ between lyricist and composers while MSP only has one metadata field for composer. So for a cvs I have to join those fields and sometimes the lyricist is also composer or cocomposer. I wanted to know if it's a problem for MSP if I don't edit the duplicates manualley and there's a name twice and if it only creates one entry.
Reply
#13
(01-07-2016, 05:02 AM)sciurius Wrote: There are a few errors in the page ranges.
Song I'm Gettin' Sentimental (F) has page 386A, this should be 558.
Song I'm Gettin' Sentimental (Bb) has page 386B, this should be 559.

I haven't tried an import yet but had a look.

I wouldn't list the TOCs in the CSV but that's my personal preference.

You  only started with composer and year entries for the first two titles. Anyway the multiple composers haven't been delimited with | and the year is in the same column as composer.

I can and will contribute some CSVs to but will have to edit them some first.

What about a CVS for the CSVs?
Reply
#14
(01-07-2016, 08:14 PM)BRX Wrote: You have my vote for an update feature in this case (same title/same source). I think it's too much to have an "intelligent" update feature with comparing fields. It should be sufficient if the old metadata is overwritten with the data of the update cvs.

FWIW, my MSPro-Tools on GitHub include get_meta and put_meta tools. Get_meta produces a JSON file that contains most of the metadata. You can modify it, and use put_meta to update the database.
You still have to (manually) transfer the database from the tablet to the PC and vice versa. And of course you risk losing everything Smile .

https://github.com/sciurius/MSPro-Tools
Johan
www.johanvromans.nlwww.howsagoin.nlwww.hetgeluidvanseptember.nl
Samsung Galaxy Note 2 (N8010) 10.1", Android 7.1.2 (LineageOS), AirTurn Duo & Digit.
Asus Zenpad (Z300M) 10.1", Android 7.0 (backup).
Samsung A3 (A320FL), Android 8.0 (emergency).
Reply
#15
I was irritated by the statement that it is the "FirehouseJazzbandFakebook" that loads so slowly as I can use it without problems. So I double-checked my post and this is what I found:
on
https://archive.org/download/fakebook_th...-fake-book
there is a huge number of files. The link in my post pointed to 
thefirehousejazzbandfakebook_text.pdf                          16-Jan-2013 22:53     64.0M
what I am using is a copy of
thefirehousejazzbandfakebook.pdf                               16-Jan-2013 13:04     102.4M
I don't know how archive.org creates the files they make available and what exactly is the difference to the _text.pdf. Maybe they run some kind of OCR over the scans which is confused by trying to recognise music scores. In fact something is wrong with thefirehousejazzbandfakebook_text.pdf it even takes a long time to open it on the PC.
Sorry, my fault. Please try again with the link
https://archive.org/download/fakebook_th...kebook.pdf
first language: German
Acer A1-830, Android 4.4.2 - HP x2 210 G2 Detachable, Win 10 1803
www.moonlightcrisis.de - www.basdjo.de - www.frankenbaend.de


Reply
#16
Hi all, just catching up. It's very cool that MSP now supports CSV-based import of multiple songs from a single PDF. It would be especially cool if we could align the format with the files in my project https://github.com/aspiers/book-indices, so that we can all collaborate on building CSV index files which can be used seamlessly both inside MSP and outside it (e.g. with my https://github.com/aspiers/PDFexploder project which explodes large PDFs into one correctly named file per song).

I'd appreciate suggestions on how to move forward with that.
Reply
#17
I think that a collective approach towards fakebook indices is very good, and I appreciate you taking initiatives!

However, looking at your CSVs I think they're too limited. They contain just a starting and ending page, not page ranges. If two pages were swapped in your copy of the book --and this happens-- this cannot be dealt with.
Also, there's no provision for information like key, composer, artist. Columns are fixed.

Another potential trap of these indices is "The final page number is optional, because it can often be automatically inferred by the starting page of the next tune, ...".

In NewReal1.csv I see:

Code:
Airegin,17,
# ...
All Or Nothing At All,444,
Always There,18,

This would mean that song "Airegin" runs from page 17 thru 433, and "All or nothing at all" will probably crash the program Smile.

It would be great to employ a more generic data standard for these indices. With a headings row, it would already be more flexible. For example, an index entry could use either "startpage" and "endpage", or a more powerful "pagerange". And it allows for additional, optional information without affecting tools that to do not understand this.

A final remark: the github page reads "... it's the very well-known CSV (or Comma-Separated Values) format". I'm sorry to disappoint you, but although well-known, there is no such thing as a standard for CSV formatted data files. The de facto standard defined in RFC 4180 is a good starting point.
Johan
www.johanvromans.nlwww.howsagoin.nlwww.hetgeluidvanseptember.nl
Samsung Galaxy Note 2 (N8010) 10.1", Android 7.1.2 (LineageOS), AirTurn Duo & Digit.
Asus Zenpad (Z300M) 10.1", Android 7.0 (backup).
Samsung A3 (A320FL), Android 8.0 (emergency).
Reply
#18
(01-19-2016, 10:57 PM)sciurius Wrote: I think that a collective approach towards fakebook indices is very good, and I appreciate you taking initiatives!

Thanks!  And I very much appreciate your useful feedback!

(01-19-2016, 10:57 PM)sciurius Wrote: However, looking at your CSVs I think they're too limited.

Absolutely - that's why I asked for feedback in the first place :-)  The existing repo is just a prototype.

(01-19-2016, 10:57 PM)sciurius Wrote: They contain just a starting and ending page, not page ranges.
If two pages were swapped in your copy of the book --and this happens-- this cannot be dealt with.

Very good point, but this is extremely easy to fix!  For example we could collapse the page selection into a single field which supports different types of values:
  • "5" would mean just page 5
  • "5-" would mean page 5 and all subsequent pages until the next page not claimed by any other song
  • "5-8,10,9,11-" would mean the same as above but with pages 9 and 10 swapped round and assuming that the song doesn't finish before page 11
(01-19-2016, 10:57 PM)sciurius Wrote: Also, there's no provision for information like key, composer, artist. Columns are fixed.

Yes - also easily corrected, and I agree that it obviously needs to be.

(01-19-2016, 10:57 PM)sciurius Wrote: Another potential trap of these indices is "The final page number is optional, because it can often be automatically inferred by the starting page of the next tune, ...".

In NewReal1.csv I see:

Code:
Airegin,17,
# ...
All Or Nothing At All,444,
Always There,18,

This would mean that song "Airegin" runs from page 17 thru 433, and "All or nothing at all" will probably crash the program Smile.

Oh dear, you seem to have a very dim view of my programming skills ;-)  However you can set your fears to rest; my code easily handles this, e.g. https://github.com/aspiers/PDFexploder/b...ion.rb#L25 - and any other implementation would find it easy to do the same.

(01-19-2016, 10:57 PM)sciurius Wrote: It would be great to employ a more generic data standard for these indices. With a headings row, it would already be more flexible. For example, an index entry could use either "startpage" and "endpage", or a more powerful "pagerange". And it allows for additional, optional information without affecting tools that to do not understand this.

Absolutely - great idea.

(01-19-2016, 10:57 PM)sciurius Wrote: A final remark: the github page reads "... it's the very well-known CSV (or Comma-Separated Values) format". I'm sorry to disappoint you, but although well-known, there is no such thing as a standard for CSV formatted data files. The de facto standard defined in RFC 4180 is a good starting point.

You are not disappointing me, because I entirely disagree ;-)  That is a separate discussion which probably does not belong on this forum, but the starting point would be to consider what is a reasonable definition of "standard", then to consider some other popular standards in the computing world (e.g. the myriad of standards relating to email), and finally compare how strictly those are ratified and adhered to by implementations, relative to CSV.  We leave in a messy world, where even supposedly clearly defined standards often contain crucial ambiguities, flaws, and rival definitions.  However that does not preclude them from being standards; it just means that they could be improved.  But I'm not sure it's worth having that discussion on this forum.

Standard or not, it does not diminish the point I made elsewhere, which is that the potential conflict in CSV files between characters used both as delimiters and within data fields is trivially solved by a solution which has been successfully used in CSV implementations all over the world for multiple decades.  And that solution is simply to quote either just the data fields which contain delimiter characters, or quote all data fields.  (My preference is the former, since the latter leads to CSV files which are less readable by humans.)  So this is a solved problem and I would strongly recommend this community to reuse that solution rather than aim for an alternative which relies on the delimiter character never being needed within any data field, since that is pretty much doomed to fail in some corner cases. 

Anyway, thanks again for the great feedback!  Cheers, Adam
Reply
#19
(01-20-2016, 02:33 AM)aspiers Wrote: Oh dear, you seem to have a very dim view of my programming skills ;-)  However you can set your fears to rest; my code easily handles this, e.g. https://github.com/aspiers/PDFexploder/b...ion.rb#L25 - and any other implementation would find it easy to do the same.

I trust your code doesn't crash. Neither does mine Wink .

But my objection is that using only a starting page may not be sufficiently deterministic under all circumstances.
Given the example in my posting, you would need to collect all song data and sort on page number to be sure.

Also:
Quote:"5-" would mean page 5 and all subsequent pages until the next page not claimed by any other song

The usual interpretation (e.g., LaTeX): If you specify a range consisting of a hyphen (or any tie) but with one or two empty page numbers, the following will happen:

1. a range of the form -34 is taken to mean pages 1 to 34;
2. a range of the form 12- is taken to mean page 12 to last page;
3. a range of the form - (only hyphen) is taken to mean page 1 to last page.
Johan
www.johanvromans.nlwww.howsagoin.nlwww.hetgeluidvanseptember.nl
Samsung Galaxy Note 2 (N8010) 10.1", Android 7.1.2 (LineageOS), AirTurn Duo & Digit.
Asus Zenpad (Z300M) 10.1", Android 7.0 (backup).
Samsung A3 (A320FL), Android 8.0 (emergency).
Reply
#20
Sad 
I'm going to carry the use of CSV metadata a step further.

First, I'm currently finishing a tool that reads iRealPro data and formats this into a nice PDF. iRealPro songs contain a limited amount of metadata like title, composer, style, key, and tempo.
If the iRealPro data contains an iRealPro playlist, I produce a multi-page PDF document and the corresponding metadata CSV. In other words: a Fakebook plus index in one go.

For reasons not relevant here, my tool can also produce PNGs instead of a PDF. This brings me to the feature request to extend the use of metadata CSV for other imports (in particular, batch inport) as well.

For example, I have a folder with ChordPros or PNGs, each containing one song. I can batch import this folder, but it would be very nice if I could place a metadata CSV in the folder (or specify on the import dialog) so that all imported songs can have some metadata filled in.
I even think that given how far Mike already implemented support for metadata CSVs this won't be hard to add.
Implementation hint: Add a "filename" or "pathname" element to the CSV to match a file with its metadata.
Johan
www.johanvromans.nlwww.howsagoin.nlwww.hetgeluidvanseptember.nl
Samsung Galaxy Note 2 (N8010) 10.1", Android 7.1.2 (LineageOS), AirTurn Duo & Digit.
Asus Zenpad (Z300M) 10.1", Android 7.0 (backup).
Samsung A3 (A320FL), Android 8.0 (emergency).
Reply


Digg   Delicious   Reddit   Facebook   Twitter   StumbleUpon  


Users browsing this thread:
1 Guest(s)


  Theme © 2014 iAndrew  
Powered By MyBB, © 2002-2018 MyBB Group.