It's not as easy as it used to be... and actually it never worked all that well. We've always had to deal with the problem of "Rabbit" being sorted above "apple" because of case issues (that could be dealt with by normalizing the case inside of the sort routine, and then falling back to case-sensitive sorting in case of equality). And we've always had the issue of string comparisons of numbers numerically incorrectly. But now we also have to worry about Unicode issues.

Additionally, you do want numbers to be sorted numerically, and alpha characters to be sorted alphabetically, I believe. I think that most dictionary style sorts put numbers ahead of alpha characters as well (you seemed to be looking for the opposite, but I'll ignore that for a moment).

Here's one way to do it that should be Unicode safe:

use strict; use warnings; use utf8; use Unicode::Collate::Locale; use Scalar::Util 'looks_like_number'; use feature qw/say unicode_strings/; binmode STDOUT, ':utf8'; my @unsorted = qw( 041351920234 Rabbit 0343120 041271024500 000000343119 0430870 Apple 041460301399 ); my $collator = Unicode::Collate::Locale->new(locale => 'en'); my @sorted = sort { ( looks_like_number($a) && looks_like_number($b) && $a <=> $b ) || $collator->getSortKey($a) cmp $collator->getSortKey($b) } @unsorted; say for @sorted;

Pretty ugly, and still falls back to alphabetical sorting when a given string has non-numeric characters embedded within it.

I'd love for someone to come along and show a better way to do it, as this just seems messy.


Dave


In reply to Re: Sorting Numbers & Text by davido
in thread Sorting Numbers & Text by PriNet

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.