• 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Encoding problem
#1
I cannot get the attached file to import correctly. The file is properly encoded (UTF-8) but MSPro interprets it as ISO-8859.1 (so the ’ shows as ’).

I have many UTF-8 files that are interpreted correctly but I cannot find out why this one is not.

   


Attached Files
.cho   26_Land_Van_Maas_En_Waal2.cho (Size: 1.8 KB / Downloads: 5)
Johan
johanvromans.nl — hetgeluidvanseptember.nl — mojore.nl -- howsagoin.nl
Samsung Galaxy Note S7FE (T733) 12.4", Android 13.0, AirTurn Duo & Digit (Gigs).
Samsung Galaxy Note S4 (T830) 10.5", Android 10.0 (maintenance and backup).
Samsung A3 (A320FL), Android 8.0.0 (emergency).
Reply
#2
I'll try to look into it, but I'm using the CharsetDetector from the ICU library, so I don't really have a lot of insight into how it determines what encoding to use. It's possible that this is just one file where you will have to manually change the encoding in MobileSheetsPro.
Reply
#3
I don't know if it matters, but the file doesn't have a BOM (Byte Order Mark) identifying it as UTF-8. If I load the file into Notepad++ and change the encoding to UTF-8-BOM, save and load the new file into MsPro I don't get the odd characters you show.

I just checked with your original file and I do get the odd characters. So adding the BOM improves/changes things.

HTH

Andy
Reply
#4
Yes, adding a BOM helps, although a BOM should not be necessary for UTF-8 encoded data. And many apps do not support UTF-8 BOMs.
Johan
johanvromans.nl — hetgeluidvanseptember.nl — mojore.nl -- howsagoin.nl
Samsung Galaxy Note S7FE (T733) 12.4", Android 13.0, AirTurn Duo & Digit (Gigs).
Samsung Galaxy Note S4 (T830) 10.5", Android 10.0 (maintenance and backup).
Samsung A3 (A320FL), Android 8.0.0 (emergency).
Reply
#5
I agree the BOM shouldn't be necessary, but it might help Mike track down the problem. Really you should be able to treat everything as UTF-8 unless you find a marker to the contrary. It's the unicode files without a BOM that are "wrong". (Although apps that don't accept a BOM are suspect :-) Text handling is a minefield best left to the experts.

Andy
Reply
#6
(08-26-2017, 05:32 AM)AndyL Wrote: Really you should be able to treat everything as UTF-8 unless you find a marker to the contrary. It's the unicode files without a BOM that are "wrong".

These two sentences confirm your conclusion that "Text handling is a minefield best left to the experts." Smile

The Unicode Standard permits the BOM in UTF-8, but does not require or recommend its use. (emphasis mine)
Johan
johanvromans.nl — hetgeluidvanseptember.nl — mojore.nl -- howsagoin.nl
Samsung Galaxy Note S7FE (T733) 12.4", Android 13.0, AirTurn Duo & Digit (Gigs).
Samsung Galaxy Note S4 (T830) 10.5", Android 10.0 (maintenance and backup).
Samsung A3 (A320FL), Android 8.0.0 (emergency).
Reply




Users browsing this thread:
1 Guest(s)


  Theme © 2014 iAndrew  
Powered By MyBB, © 2002-2024 MyBB Group.