in reply to Re^4: Best Way to Get Length of UTF-8 String in Bytes?
in thread Best Way to Get Length of UTF-8 String in Bytes?
Certainly is is less problematic and more maintainable to not count on any subtle details that might shift the meaning.
Hmm, just what is the 8-bit form? If it's "whatever was read in", it might include characters encoded in multiple bytes, using some other code page. So, I would be inclined to feel safe treating the internal length in bytes as the UTF-8 length if I read in the string from a file using UTF-8 encoding, or it was a string literal in a program whose source file used utf8. I think there is also a utility function somewhere to tell you which mode a string is in.
In fact, wouldn't the UTF-8 encoder just check that flag first and realize it's a no-op? So using it would be efficient, if you don't mind copying the string.
|
|---|