in reply to Re^3: Best Way to Get Length of UTF-8 String in Bytes?
in thread Best Way to Get Length of UTF-8 String in Bytes?
And we are also aware of how unlikely it is to a problem for Jim given the data samples he displayed.
As you can plainly see, it’s only your own isolated little byte constants that can switch internal representation. All you have to do is ever once have a code point greater than 255 anywhere in the string and it stops being a byte string. You also won’t have a problem if you’ve read in the utf8 from something whose encoding layer is set to utf8. So if he has either of those in his program — which it looks like he does — he can ignore Chicken Little.% perl -CS -E 'say chr(0xe9)' | perl -CS -nE 'require bytes; say byte +s::length($_); chomp; say bytes::length($_)' 3 2 % perl -E '$x = "\x{e9}\x{3b1}"; require bytes; say bytes::length($x); + chop $x; say bytes::length($x)' 4 2 % perl -E '$x = "\N{U+E9}"; require bytes; say bytes::length($x)' 2
It won’t bother him. I’ll bet.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Best Way to Get Length of UTF-8 String in Bytes?
by ikegami (Patriarch) on Apr 24, 2011 at 06:00 UTC | |
|
Re^5: Best Way to Get Length of UTF-8 String in Bytes?
by John M. Dlugosz (Monsignor) on Apr 24, 2011 at 11:29 UTC |