http://qs1969.pair.com?node_id=223687

superpete has asked for the wisdom of the Perl Monks concerning the following question:

I know this topic has been beaten to death, but I'm not satisfied with the code I came up with, and am looking for a neat/elegant way.

Purpose: sort a list of strings in a way that is useful to humans - so that "foo2bar" comes before "foo10bar", and so forth.

Here is my ugly code (which I think works)

sub sort_n { return map { $_->[0] } sort { my $l = @{$a->[1]} < @{$b->[1]} ? scalar @{$a->[1]} : scalar @{$b->[1]} ; for (my $n=0; $n<$l; $n++) { if ( $a->[1][$n] =~ /^(\D+)$/ ) { if ( $b->[1][$n] =~ /^(\D+)$/ ) { my $tmp = $a->[1][$n] cmp $b ->[1][$n]; return $tmp if $tmp; } else { return 1; } } else { if ( $b->[1][$n] =~ /^(\D+)$/ ) { return -1; } else { my $tmp = $a->[1][$n] <=> $b ->[1][$n]; return $tmp if $tmp; } } } return @{$a->[1]} <=> @{$b->[1]}; } map { [ $_ , [ # each element of @_ is split on boundaries # between digit and nondigit split ( /(?:(?<=\d)(?=\D))|(?:(?<=\D)(?=\d))/, $_ ) ] ] } @_; }

Replies are listed 'Best First'.
Re: sort, semi-numeric
by tachyon (Chancellor) on Jan 02, 2003 at 06:30 UTC

    You are using a Schwartzian Transform which is good. In this example my expensive function splits the name into three parts (word chars, digits or ., and then everything else). split is a good way to do it as 1) if it fails it will return the whole string (so we can sort on that). If it succeeds then we need to remember that it will split at all the number . sequences so we put the string back together after the first split. If we were to use a straight regex here and it failed to match we would need to have all sorts of fallback cases split() handles it neatly.

    The sort then proceeds on first bit, numbers, last bit. If you just wanted to sort foo-bar-0.01.tar.gz then you only need to return @bits[0,1] from func() for it to work correctly.

    sub sort_n { return map { $_->[0] } sort { $a->[1] cmp $b->[1] || $a->[2] <=> $b->[2] || $a->[3] cmp $b- +>[3] } map { [$_, func($_)] } @_; sub func { my @bits = split /([\d\.]+)/, $_[0]; return @bits[0,1], j +oin '', @bits[2..$#bits] } } my @ary = qw( z1z z2a z2z a1a a2b a1.1a a2.1a a0.1a a0.01a b bbb b2b2b +2b 2b2c a2bc2 ); print "This is what func() does for us\n"; $" = ', '; for (@ary) { my @a = func($_); print "@a\n" } print "\nHere is the sorted list\n"; print "$_\n" for sort_n(@ary); __DATA__ This is what func() does for z, 1, z z, 2, a z, 2, z a, 1, a a, 2, b a, 1.1, a a, 2.1, a a, 0.1, a a, 0.01, a b, , bbb, , b, 2, b2b2b , 2, b2c a, 2, bc2 Here is the sorted list 2b2c a0.01a a0.1a a1a a1.1a a2b a2bc2 a2.1a b b2b2b2b bbb z1z z2a z2z

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: sort, semi-numeric
by jdporter (Paladin) on Jan 02, 2003 at 04:46 UTC
      Apparently it just wouldn't be a day on PerlMonks if no one asked this question. (*sigh*)

      I know, I know... But I really did have to ask, and now I'm smarter for it :-)

Re: sort, semi-numeric
by John M. Dlugosz (Monsignor) on Jan 02, 2003 at 08:21 UTC
    You might also find this interesting: Ordered Comparison of Generalized Version Strings.. I think this will do exactly what you want, already. The thread discusses other ways of comparing multi-part strings, too. Something there should be useful to you.

    —John

Re: sort, semi-numeric
by superpete (Beadle) on Jan 02, 2003 at 18:12 UTC
    Ok, I've reduced it to this :-)
    sub sort_n { return map { $_->[0] } sort { my $l = @{$a->[1]} < @{$b->[1]} ? @{$a->[1]} : @{$b->[1]}; for (my $n=0; $n<$l; $n++) { my $A = $a->[1][$n]; my $B = $b->[1][$n]; my $tmp; if ($A =~ /\d/) { if ($B =~ /\d/) { $tmp = $A <=> $B; } else { $tmp = -1; } } else { if ($B =~ /\d/) { $tmp = 1; } else { $tmp = $A cmp $B; } } return $tmp if $tmp; } return @{$a->[1]} <=> @{$b->[1]}; } map { [ $_ , [ split( /(\d+)/, $_ ) ] ] } @_; }