Unicode gives some entities names, and call these things codepoints. They are things like 'a', '1' etc, but also non-ASCII stuff like 'ä' or 'ش' (if you have no arabic fonts installed, you won't see anything).
In most encodings you need several bytes to display such characters. Thus the number of bytes and codepoints can be different.
But that's not enough: There are sequences of codepoints that, when displayed, look like a single character.
perl -Mcharnames=:full -CS -wle 'print "A\N{COMBINING ACUTE ACCENT}"' Á
So it's two codepoints, but only one "grapheme".
So you can get the number of characters, depending on what you mean by "characters":
$x.bytes $x.codes $x.graphs
There's also a current "Unicode Level", which defaults to graphemes. So you can say $x.chars, and that will internally call the method that corresponds to the current unicode level.
In reply to Re: Length of a string in Perl6
by moritz
in thread Length of a string in Perl6
by websterling
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |