sub _nsc
{
my( $x, @a ) = @{$_[0]};
my( $y, @b ) = @{$_[1]};
@a && @b ? ( $x =~ /\d/ ? ( $y =~ /\d/ ? $x <=> $y ||
_nsc(\@a,\@b) : -1 ) : ( $y =~ /\d/ ? 1 : lc($x) cmp
lc($y) || _nsc(\@a,\@b) )) : ( @a ? 1 : @b ? -1 : 0 )
}
sub natsort
{
map join('',@$_), sort {_nsc($a,$b)} map [/(\d+|\D+)/g], @_
}
We're building the house of the future together.
| [reply] [d/l] |
sub _nsc
{
my @a = @{$_[0]};
my @b = @{$_[1]};
@a && @b or return( @a ? 1 : @b ? -1 : 0 );
my $x = shift @a;
my $y = shift @b;
$x =~ /\d/ ? ( $y =~ /\d/ ? length($x)<=>length($y)
|| $x <=> $y || _nsc(\@a,\@b) : -1 ) : ( $y =~ /\d/
? 1 : lc($x) cmp lc($y) || _nsc(\@a,\@b) )
}
(This version also optimizes to only compare numeric strings as numbers iff they have
the same length.)
We're building the house of the future together.
| [reply] [d/l] |
Forget golf, it doesn't even work. <=> will fail for 865314457646576532325988.
| [reply] [d/l] [select] |
| [reply] [d/l] |
Not so much golfed as untidy... :-(
Update: chopped out a few more strokes. later: and a few more...
sub n{for(\@a,\@b){@$_=split/(?<=\d)(?=\D)|(?<=\D)(?=\d)/,lc
shift@_};for(0..$#a){$j=$a[$_];$k=$b[$_];return$j=~/\d/&$k=~/\d/?($j==
+$k?next:$j<=>$k):($j=~/\D/&$k=~/\D/?($j
eq$k?next:$j cmp$k):(defined$k?$j cmp$k:1))}}
@testarray = (qw/86 101 10 20 1 A amstelveen a10 A1
Amsterdam Amsterdam5 Amsterdam40 Amsterdamned
112kimn678mn09 112kimm678mn19 112kimm678mn09
112kimm678mn9 865314457646576532325988/);
print join "\n", sort { n($a,$b)} @testarray;
| [reply] [d/l] |
It is small (97 strokes, including the meaningful newline), but it is also very fast. There are certainly places where a few characters can be shaved but several of those break subtle features such as one demonstrated in the example code: sorting the sample data correctly even when each record ends with a newline.
use strict;
use warnings;
sub naturalSort {
my$i;s/(\d+)/pack"aNa*",0,length$1,$1/ge,$_.=$".$i++for
# 345678 1 2345678 2 2345678 3 2345678 4 2345678 5 2345
my@x=map{lc}@_;@_[map{(split)[-1]}sort@x]
#8 6 2345678 7 2345678 8 2345678 9 234567
}
my @data= <DATA>;
chomp @data if $\;
print for naturalSort(@data);
__END__
amsTerdam40
amstelveen
1
AmstErdam5
Amsterdam
amsterDamed
A
Run it normally or with "perl -l ..." to see that it sorts the same with or without the newlines in the strings being sorted.
| [reply] [d/l] |
perl -e '$\=$/;$,=" ";@d=@a=sort{lc$a cmp lc$b}@ARGV;for(0..$#a){if($a
+[$_]=~/(\d+)/){$b{$1}=$_;push@c,$_}} for(sort{$a<=>$b}keys%b){$d[shif
+t@c]=$a[$b{$_}]}print@d' 1 A amstelveen Amsterdam Amsterdam5 Amsterda
+m40 Amsterdamned
| [reply] [d/l] |
$ perl -e 'sub k{for(@p=$_[0]=~/\d+|[a-z]+/gi){if(/^\d/){s/^0+//;$l=le
+ngth;$_="9"x($l/9).$l%9 .$_}}lc join"\0",@p}@s=sort{k($a)cmp k$b}@ARG
+V;print"@s\n"' 1 A amstelveen Amsterdam Amsterdam5 Amsterdam40 Amster
+damned
1 A amstelveen Amsterdam Amsterdam5 Amsterdam40 Amsterdamned
and numbers are not limited to any range. | [reply] [d/l] |
I'm not really interested in Perl Golf, hence this late reply, but I'd like to contribute by stating my favourite approach for natural sorting, that apparently nobody has touched in this thread. The basis is this:
split /(\d+)/
This will split the example strings like this:
- 1
- "", 1
- A
- "A"
- amstelveen
- "amstelveen"
- Amsterdam
- "Amsterdam"
- Amsterdam5
- "Amsterdam", 5
- Amsterdam40
- "Amsterdam", 40
- Amsterdam40b
- "Amsterdam", 40, "b"
- Amsterdamned
- "Amsterdamned"
and now for some more complex examples:
- Chapter 1 Section 3
- "Chapter ", 1, "Section ", 3
- Chapter 1 Section 10
- "Chapter ", 1, "Section ", 10
Now the basic trick to do natural sorting, is partwise comparison of the items, comparing the even numbered items (starting at 0) with alphabetical comparison, case insensitive even if you like, and the odd numbered items with numerical comparison.
This is code that does basically that:
sub natcomp {
my @a = split /(\d+)/, $a;
my @b = split /(\d+)/, $b;
my $last = min(scalar @a, scalar @b)-1;
my $cmp;
for my $i (0 .. $last) {
unless($i & 1) { # even
$cmp = lc $a[$i] cmp lc $b[$i] || $a[$i] cmp $b[$i] and re
+turn $cmp;
}else { # odd
$cmp = $a[$i] <=> $b[$i] and return $cmp;
}
}
return scalar @a <=> scalar @b; # shortest array comes first
}
Let's try this with:
chomp(my @array = <DATA>);
$\ = "\n";
print for sort natcomp @array;
__DATA__
1
A
amstelveen
Amsterdam
Amsterdam40
Amsterdam40b
Amsterdam5
Amsterdamned
Chapter 1 Section 10
Chapter 1 Section 3
Chapter 10 Section 2
Chapter 2 Section 1
(Note that the DATA section is sorted alphabetically, and that it contains an empty string)
The result is:
1
A
amstelveen
Amsterdam
Amsterdam5
Amsterdam40
Amsterdam40b
Amsterdamned
Chapter 1 Section 3
Chapter 1 Section 10
Chapter 2 Section 1
Chapter 10 Section 2
I hope that order is to your liking.
| [reply] [d/l] [select] |
Perhaps OP could provide a test data set as a __DATA__ block and provide sample output illustrating the expected sort order for the given data? It could well be that test strings will be added as new edge cases are described. Perhaps the first line should be: 0 Data set version 1 so that it is obvious in sample output which "rules" any particular solution has been tested against.
DWIM is Perl's answer to Gödel
| [reply] [d/l] |