Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^4: length() miscounting UTF8 characters?

by ikegami (Patriarch)
on Apr 30, 2014 at 18:38 UTC ( [id://1084539]=note: print w/replies, xml ) Need Help??


in reply to Re^3: length() miscounting UTF8 characters?
in thread length() miscounting UTF8 characters?

The problems with length are not around bytes vs. characters, but that length counts code points. Many logical characters are composed from multiple code points

1. What you call "logical character" is an "extended grapheme cluster", which I abbreviate to "grapheme".

2. length doesn't count code points. length always counts characters (string elements). It has no idea what those characters are as that information is neither available nor needed. They are just 32-bit or 64-bit numbers to length. They could be bytes. They could be Unicode code points. But they aren't going to be graphemes (visual character) as there is no existing system to encode graphemes in a single number.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1084539]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-25 23:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found