G'day vrk,
"Besides, Unicode codepoints often aren't ordered alphabetically in any script, so you wouldn't get a sorted (collated) sequence even if it did."
[Note: There's no intended pedantry here; however, as I understand your statement, I believe you mean "characters", not "codepoints". On that basis, I don't disagree with your statement, at all. The distinction is important for the remainder of my response.]
The builtin module Unicode::Collate can be used for sorting Unicode characters.
$ perl -E 'say for sort qw{z é a}' a z é $ perl -MUnicode::Collate -E 'say for Unicode::Collate::->new->sort(qw +{z é a})' a é z
The code points are numerical values: a numerical sort is required for these.
$ perl -E 'say for sort map { ord } qw{z é a}' 122 195 97 $ perl -E 'say for sort { $a <=> $b } map { ord } qw{z é a}' 97 122 195
Code points are often presented as hexidecimal strings (that may have a leading "U+"). When dealing with these, it can be useful to first convert them to some canonical format. As the code point range is 0 .. 0x10ffff, an sprintf format including "%06x" or "%06X" handles all cases.
$ perl -E 'say sprintf "U+%06X", $_ for map { ord } qw{z é a}' U+00007A U+0000C3 U+000061
— Ken
In reply to Re^2: Using "negative" characters with the range operator. [Unicode::Collate]
by kcott
in thread Using "negative" characters with the range operator.
by Bowlslaw
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |