18 Oct 2012
But a UTF-8 encoded string doesn't contain it.
Today I loaded a file containing a UTF-8-encoded string, and tried to parse it with CSV.
I couldn't really succeed, even I saw that there was nothing wrong with the string printed of the screen.
It turns out that, if a file containing a UTF-8 string, there will be Byte Order Mark for 3 bytes.
Thanks to Cesar for pointing it out. He also explains the importance of Byte Order Mark (BOM). Without it, there would be no way to distinguish between a file containing UTF-8 string and a file containing LATIN string. They would look the same.
So, before processing the string with CSV library, just cut out the first 3 bytes…