It’s just UTF-8
Android defaults to UTF16
All because of java
TIL I didn’t realize Java used UTF16 for its internal representation. Looks like it’s a bit more complicated than that after Java 9 too
You’d think things would be simple, otherwise the existence of UTF-8.
And yet for the last 17 years, every company I’ve been in has had some sort of horrible mess involving unicode and non-unicode and nobody either recognising the problem, or knowing how to solve it when they did recognise it (“well, the £ turns into a ? so we just replace any ? in the filename by a £”).
On the second day, he gave them css.
Text encoding ‘standards’ were clearly the devil’s work, handed down to humanity to sow chaos and suffering.
In my experience things are fine while you work in a single environment, or you have control over the entire pipeline of data. Things quickly turn into a story from the Bible when different systems start trying to communicate.
Already with a single standard in a single project things have a tendency to start breaking down as soon as there’s more than one developer and disagreement arises about what the text in the standard specification actually means.
That’s true yeah. The seed of all the problems is assuming.
My teammates assumed System.DefaultEncoding must be some default value (UTF-8, they assumed, again) that would carry across all servers so no worries. Except no, it’s “whatever encoding is configured on this machine as the default code page”.
Which was the same across our networks, lucky them.
But for this one machine setup by an external contractor who had UTF-8 as default.
That one took me a while to track down…
Get thee behind me, anything beyond extended ASCII.