Saturday, September 01, 2018

UTF8 in-memory strings in C#

GoLang is storing strings as UTF8 encoded by default, that takes 1 byte per character for typical ASCII strings. .NET (and Windows, Java and JavaScript) are storing strings in memory as UTF16 encoded, taking about twice as much space for English strings.
(co-creator of GoLang Rob Pike is also co-creator of UTF8 :)

Does storing strings as UTF8 has a practical advantage?
Here is an interesting detailed analysis by a real C# expert:

Of memory and strings | Jon Skeet's coding blog


Compact strings in the CLR · Performance is a Feature!

Go code is UTF-8 encoded – golangspec – Medium

Golang Strings - golangbot.com


Java May Use UTF-8 as its Default Charset - DZone Java



No comments: