http://qs1969.pair.com?node_id=112388

   1: #!/usr/bin/perl -w
   2: 
   3: =head1 by_number
   4: 
   5: This demonstrates the use of a custom sorting routine "by_number"
   6: which is designed to sort strings with embedded numbers in the same
   7: way that a human might expect, ie taking account of the magnitude of
   8: the number rather than the lexicographical ordering of it.
   9: 
  10: This is especially good for sorting IP addresses - ie
  11: 
  12:          Input         Alpha     by_number
  13:       10.0.0.2      10.0.0.1      10.0.0.1
  14:       10.0.0.1     10.0.0.10      10.0.0.2
  15:      10.10.1.1      10.0.0.2     10.0.0.10
  16:      10.1.10.1     10.0.1.10     10.0.1.10
  17:       10.2.1.2      10.0.2.2      10.0.2.2
  18:       10.0.2.2     10.1.10.1     10.1.10.1
  19:      10.0.1.10     10.10.1.1      10.2.1.2
  20:      10.0.0.10      10.2.1.2     10.10.1.1
  21: 
  22: Try the program for a longer example.
  23: 
  24: =cut
  25: 
  26: use strict;
  27: 
  28: my @list = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, rand] } qw {
  29:    1 2 3 4 10 11 12 20 21 22 100 1000 a00 a10 a29 abc1 abc1a abc11b
  30:    abc111a2 abc1b12 10.0.0.1 10.0.0.2 10.0.0.10 10.0.1.10 10.0.2.2
  31:    10.1.10.1 10.2.1.2 10.10.1.1
  32: };
  33: 
  34: my $result = [
  35:   [ "Input", @list ],
  36:   [ "Alpha", sort @list ],
  37:   [ "by_number", sort by_number @list ],
  38: ];
  39: 
  40: for my $i (0..$#{$result->[0]})
  41: {
  42:     for my $j (0..$#$result)
  43:     {
  44:         printf "%14s", $result->[$j][$i];
  45:     }
  46:     print "\n";
  47: }
  48: 
  49: # Embedded numeric sorter sorts IP addresses & xyz123
  50: 
  51: sub by_number
  52: {
  53:     my @a = split /(\d+)/, $a;
  54:     my @b = split /(\d+)/, $b;
  55:     while (@a && @b)
  56:     {
  57:         my ($aa, $bb) = (shift(@a), shift(@b));
  58:         my $res = ($aa =~ /^\d/ && $bb =~ /^\d/) ?
  59:             $aa <=> $bb :
  60:             $aa cmp $bb ;
  61:         return $res if $res;
  62:     }
  63:     return @a <=> @b;
  64: }
  65: 
  66: =head2 PS
  67: 
  68: I originally wrote the above split()s as
  69: 
  70:    my @a = split /(?<!\d)(?=\d)|(?<=\d)(?!\d)/, $a;
  71: 
  72: Which is is much more exciting being a replacement for \b but for
  73: non-number/number boundaries.  It is the only time I've ever used
  74: all 4 assertions in 1 regexp.  However putting brackets in the split
  75: is much easier!
  76: 
  77: =head2 PPS
  78: 
  79: There are of course lots of modules on CPAN which do this sort of
  80: thing - very similar is Sort::Naturally or Sort::Versions or for the
  81: mother of all sorts Sort::ArbBiLex.  In production you might like to
  82: use one of those - this is merely provided for your entertainment ;-)
  83: 
  84: =cut

Replies are listed 'Best First'.
(tye)Re: Sorting by_number
by tye (Sage) on Sep 15, 2001 at 00:00 UTC

    See also How do I do a natural sort on an array?. I might do the above something like:

    my @sorted= grep { s#0(.{4})#unpack"N",$1#ges; 1 } sort grep { s#(\d+)#"0".pack"N",$1#ge; 1 } @{[ @list ]};
    Even though it is abusing grep a little. (:

            - tye (but my friends call me "Tye")
      Ha, a one liner - very clever! Well that is almost good apart from the fact that it modifies the array. It strips the leading 0s from embedded numeric strings, so in the exmple above a00 is modified to a0 which is bad :-(

      You can solve this by recasting it as an Shwartzian transform thus :-

      my @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { (my $a = $_) =~ s#(\d+)#"0".pack"N",$1#ge; [$_, $a] } @list;
      This also gets rid of the offensive greps!

      This is likely to go wrong if embedded numerics are > 232 a problem which isn't easily solvable. In my original method these numbers should be automatically upconverted to doubles by perl avoiding the problem.

      People who are puzzling out the above might like to consider replacing "0".pack"N",$1 with sprintf"%016d",$1

        Yes, for more robust versions of this, see the previously mentioned link.

        Actually, dealing with non-negative integers beyond 2**32 or even that won't fit in a double (either accurately or at all) is rather easy. For example, ignoring the case of extra leading zeros again:

        my @sorted= grep { s#0.{4}(\d+)#$1#gs; 1 } sort grep { s#(\d+)#"0".pack("N",length$1).$1#ge; 1 } @{[ @list ]};
        This deals with any sequence of nearly 2**31 digits (I won't commit beyond that because I'm not certain that all ports of Perl will have a length function that works beyond that -- not that even 2**31 is much of a practical limitation).

        Note that it preserves leading zeros unlike the previous one-liner, but it doesn't sort them intuitively. To deal with extra leading zeros properly requires a bit more work, for example:

        my @sorted= grep { s#0.{4}(\d+),(0*)!#$2$1#gs; 1 } sort grep { s#(0*)(\d+)#"0".pack("N",length$2)."$2,$1!"#ge; 1 } @{[ @list ]};

        Also note that I intentionally did not use a Schwartzian Transform as it is rather slower and uses more memory.

                - tye (but my friends call me "Tye")