• 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
CSV file format
#1
Now CSV imports are becoming popular, it is important to know what the exact format requirements are.

CSV is a long time evolved and rather informal file format. There are a lot of conventions that seem work.

In 2005 an attempt was made to formally define the format in RFC 4180.

What are the requirements for MSPro? The current documentation is usable but lacks information on:
  • File encoding. Given that MSPro is an Android/java application I assume the default is UTF-8. Does MSPro take a BOM for UTF-16 and UTF-32 encodings?
  • Separator character. According to the docs MSPro detects the separator from the first line. Anything goes?
  • Field quoting. Does MSPro accept double quotes (" in HTML) to surround fields? A doubled double-quote to embed a single double-quote in a quoted field?
  • What if a multi-valued field contains a '|'?
  • Maybe even full RFC 4180 compliance?

Just a couple of details that may be interesting to add to the documentation.
Johan
johanvromans.nl — hetgeluidvanseptember.nl — mojore.nl -- howsagoin.nl
Samsung Galaxy Note S7FE (T733) 12.4", Android 13.0, AirTurn Duo & Digit (Gigs).
Samsung Galaxy Note S4 (T830) 10.5", Android 10.0 (maintenance and backup).
Samsung A3 (A320FL), Android 8.0.0 (emergency).
Reply
#2
MobileSheetsPro will handle UTF-8, UTF-16BE and UTF-16LE. It strips the bom marker at the beginning. I don't support UTF-32 as I've never a used a file with that encoding, I'm not sure if the open source libraries I'm using properly support that and the tool I use to edit files (Notepad++) doesn't have a way to convert files to that encoding. 

MobileSheetsPro will accept any character after "title" as the delimiter. Anything goes.

MobileSheetsPro does accept double quotes to surround fields. I do not support a doubled double-quote though. That is something I would need to add support for.

I split any field using '|', so if a multi-valued field contains a '|', whatever is listed is going to be split into multiple parts. That is how you can specify multiple genres, artists, albums, etc.

I'm pretty close to being fully compliant with RFC 4180. I don't preserve white space at the start or end of values though (I trim them as I didn't see the value in leaving white space there). Perhaps this is wrong and I should always honor whatever white space is specified. 

Thanks,
Mike
Reply
#3
Sounds great.

Quote:I do not support a doubled double-quote though.

I don't think this is urgent.

Quote:I don't preserve white space at the start or end of values though (I trim them as I didn't see the value in leaving white space there).

Personally I would say that it is important to retain spaces in quoted strings. So (using underscores to represent spaces) ;_foo_; may yield "foo" or "_foo_", but ;"_foo_"; should yield "_foo_". But also not very urgent.

Just wait until users are filing issues Wink
Johan
johanvromans.nl — hetgeluidvanseptember.nl — mojore.nl -- howsagoin.nl
Samsung Galaxy Note S7FE (T733) 12.4", Android 13.0, AirTurn Duo & Digit (Gigs).
Samsung Galaxy Note S4 (T830) 10.5", Android 10.0 (maintenance and backup).
Samsung A3 (A320FL), Android 8.0.0 (emergency).
Reply




Users browsing this thread:
1 Guest(s)


  Theme © 2014 iAndrew  
Powered By MyBB, © 2002-2024 MyBB Group.