• 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to create CSV files for PDF fakebooks
#1
I have been asked again how to get CSV files to import PDF fakebooks. As often: the answer is "it depends" so I will document some examples and write down step by step what I do as well as the tools that I use.

Tools that I use (free of charge, for Win10):

LibreOffice Calc
Spreadsheet tool. CSV features are imho better and easier to use than in Microsoft Office - and it's free
https://www.libreoffice.org/download/dow...breoffice/

Notepad++
A great plain text editor with excellent search & replace features.
https://notepad-plus-plus.org/downloads/

MSPro-Tools
Sciurius created a suite of perl scripts that are very useful for advanced MobileSheets users.
They require a Perl runtime environment and run in a command window.
https://github.com/sciurius/MSPro-Tools

jPDFBookmarks
A nice tool to work with PDF bookmarks. Very old but still working fine.
https://sourceforge.net/projects/jpdfbookmarks/


Now let's start creating CSV files

1.) To start with the easiest way: search the MobileSheets forum for "CSVFile" (without blank) and scroll through the search results.
https://zubersoft.com/mobilesheets/forum...-6667.html
If you're successful, check if the CSV you found matches your PDF. 
If you have to correct the pages this might help: https://zubersoft.com/mobilesheets/forum...-8374.html

2.) If your PDF contains valid bookmarks: pdf2csv.pl from MSPro-Tools converts the PDF bookmarks to a CSV file
I recommend using it with -x startpage=%%{startpage} to create an extra column with the startpage of each song. 
This allows switching the sort order easily between alfabetically and 'by page'. 
As 'startpage' is not a known column name for MobileSheets that column is ignored during CSV import.
You can use the result as it is or add additional rows and columns as you like.

3.) If your PDF contains valid bookmarks (alternative): you can export the bookmarks with jPDFBookmarks and modify the export file to a valid CSV file.
Load the PDF into jPDFBookmarks. Calling "Tools - Dump" from jPDFBookmarks' menu exports the bookmarks to a .txt file.
The content of the file will look like this:
Code:
Angelina/7,Black,notBold,notItalic,open,TopLeftZoom,-1,-1,0.0
Bella Soave/18,Black,notBold,notItalic,open,TopLeftZoom,-1,-1,0.0
...
Use a text editor to replace everything after the page number with 'nothing', replace the / with a semicolon or a tab.
Change the file extension to .csv
Add column names
Now the file contents looks like this:
Code:
title;pages
Angelina;7
Bella Soave;18
...
This is already a valid CSVFile, but alas we got only the startpage of each song from the PDF bookmarks.
To set the endpage you can use that CSVFile to import the PDF into MobileSheets and add pages to each song with the '+' button of the new "Rearrange pages" UI until you reach the end of the song.
       
Or you can open the CSV file in LibreOffice Calc and use a formula that sets the endpage of each song to (startpage of the next song) - 1

Enough for today, it's a long story. More to come, stay tuned.
first language: German
Acer A1-830, Android 4.4.2 - HP x2 210 G2 Detachable, Win 10 22H2 - Huawei Media Pad T5, Android 8.0 - Boox Tab Ultra C, Android 11
www.moonlightcrisis.de - www.basdjo.de - www.frankenbaend.de


Reply
#2
When searching the web with a browser, you can use the site keyword to restrict the results to a specific site

e.g. "CSVfile  site:https://www.zubersoft.com/mobilesheets/forum"

Geoff
Samsung Galaxy Tab A6
Reply
#3
The formulas for LibreOffice Calc to calculate pages from startpage
       
 The last line must be edited manually
first language: German
Acer A1-830, Android 4.4.2 - HP x2 210 G2 Detachable, Win 10 22H2 - Huawei Media Pad T5, Android 8.0 - Boox Tab Ultra C, Android 11
www.moonlightcrisis.de - www.basdjo.de - www.frankenbaend.de


Reply
#4
You can also use tesseract to read an image of the table of contents and create a text file that you can then edit into a csv file.

This method works really well or really poorly depending on the quality and resolution of the source image file.  Sometimes it's as easy as typing the tesseract command and then adding the semicolons.

Sometimes it's easier to just retype the whole thing from scratch.

Usually it's something in the middle where you need to do a bit of fix-up on a few titles or misread page number.  "My 01d Kentucky Home" for example.

But it's usually more efficient than typing the whole thing in again from scratch.

After creating the csv file you need to check to be sure that the page numbers from the table of contents matches the pages in the pdf file.  If they don't you can easily change them (add 2 to each page number, for example) using a spreadsheet.
If you're a zombie and you know it, bite your friend!
We got both kinds of music: Country AND Western
Reply




Users browsing this thread:
1 Guest(s)


  Theme © 2014 iAndrew  
Powered By MyBB, © 2002-2024 MyBB Group.