in reply to Using "negative" characters with the range operator.

To answer the alphabetical order part of the question, it is alphabetical but only when you use ASCII letters. The range operator .. (in list context) is only able to increment strings matching /^[a-zA-Z]*[0-9]*\z/. If the start and end strings are composed of plain ASCII letters, you do get alphabetical order because the letters are arranged in alphabetical order in the ASCII code map. For example, A is followed by B because the integer value of B is 1 + the integer value of A.

But the range operator doesn't work with Unicode codepoints. Besides, Unicode codepoints often aren't ordered alphabetically in any script, so you wouldn't get a sorted (collated) sequence even if it did.

Replies are listed 'Best First'.
Re^2: Using "negative" characters with the range operator. [Unicode::Collate]
by kcott (Archbishop) on Mar 13, 2017 at 07:44 UTC

    G'day vrk,

    "Besides, Unicode codepoints often aren't ordered alphabetically in any script, so you wouldn't get a sorted (collated) sequence even if it did."

    [Note: There's no intended pedantry here; however, as I understand your statement, I believe you mean "characters", not "codepoints". On that basis, I don't disagree with your statement, at all. The distinction is important for the remainder of my response.]

    The builtin module Unicode::Collate can be used for sorting Unicode characters.

    $ perl -E 'say for sort qw{z é a}' a z é $ perl -MUnicode::Collate -E 'say for Unicode::Collate::->new->sort(qw +{z é a})' a é z

    The code points are numerical values: a numerical sort is required for these.

    $ perl -E 'say for sort map { ord } qw{z é a}' 122 195 97 $ perl -E 'say for sort { $a <=> $b } map { ord } qw{z é a}' 97 122 195

    Code points are often presented as hexidecimal strings (that may have a leading "U+"). When dealing with these, it can be useful to first convert them to some canonical format. As the code point range is 0 .. 0x10ffff, an sprintf format including "%06x" or "%06X" handles all cases.

    $ perl -E 'say sprintf "U+%06X", $_ for map { ord } qw{z é a}' U+00007A U+0000C3 U+000061

    — Ken

      Yes indeed! Thanks for the clarification.