A new golf challenge - Natural Sorting

theroninwins has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: A new golf challenge - Natural Sorting by jdporter (Paladin) on Mar 17, 2006 at 14:30 UTC
Not truly golfed; more of a canonical solution: `sub _nsc { my( $x, @a ) = @{$_[0]}; my( $y, @b ) = @{$_[1]}; @a && @b ? ( $x =~ /\d/ ? ( $y =~ /\d/ ? $x <=> $y \|\| _nsc(\@a,\@b) : -1 ) : ( $y =~ /\d/ ? 1 : lc($x) cmp lc($y) \|\| _nsc(\@a,\@b) )) : ( @a ? 1 : @b ? -1 : 0 ) } sub natsort { map join('',@$_), sort {_nsc($a,$b)} map [/(\d+\|\D+)/g], @_ }` [download] We're building the house of the future together.	[reply] [d/l]
Re^2: A new golf challenge - Natural Sorting by jdporter (Paladin) on Mar 17, 2006 at 14:44 UTC
Wups - there's a bug in that version of _nsc. Here's a better one: `sub _nsc { my @a = @{$_[0]}; my @b = @{$_[1]}; @a && @b or return( @a ? 1 : @b ? -1 : 0 ); my $x = shift @a; my $y = shift @b; $x =~ /\d/ ? ( $y =~ /\d/ ? length($x)<=>length($y) \|\| $x <=> $y \|\| _nsc(\@a,\@b) : -1 ) : ( $y =~ /\d/ ? 1 : lc($x) cmp lc($y) \|\| _nsc(\@a,\@b) ) }` [download] (This version also optimizes to only compare numeric strings as numbers iff they have the same length.) We're building the house of the future together.	[reply] [d/l]
Re^2: A new golf challenge - Natural Sorting by ikegami (Patriarch) on Mar 17, 2006 at 14:44 UTC
Forget golf, it doesn't even work. `<=>` will fail for `865314457646576532325988`.	[reply] [d/l] [select]
Re^3: A new golf challenge - Natural Sorting by QM (Parson) on Mar 17, 2006 at 15:08 UTC
Which means that a solution more robust than `<=>` will have to compare numbers using arbitrary precision (and that just pushes the threshold a few orders of magnitude farther out). Which means that numbers will have to be recognized as such, so that -123.4e-45.6E789 would not be taken as a single number, but -123.4e-456 will be. And then we get into whether whitespace, underscores, and commas/decimals (depending on locale) are allowed, and what they mean. It's a long tee shot, onto a glacier, and only transparent golf balls are available. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l]
Re^3: A new golf challenge - Natural Sorting by jdporter (Paladin) on Mar 17, 2006 at 14:55 UTC
Ouch. ;-)	[reply]
Re: A new golf challenge - Natural Sorting by thundergnat (Deacon) on Mar 17, 2006 at 17:41 UTC
Not so much golfed as untidy... :-( Update: chopped out a few more strokes. *later:* and a few more... `sub n{for(\@a,\@b){@$_=split/(?<=\d)(?=\D)\|(?<=\D)(?=\d)/,lc shift@_};for(0..$#a){$j=$a[$_];$k=$b[$_];return$j=~/\d/&$k=~/\d/?($j== +$k?next:$j<=>$k):($j=~/\D/&$k=~/\D/?($j eq$k?next:$j cmp$k):(defined$k?$j cmp$k:1))}} @testarray = (qw/86 101 10 20 1 A amstelveen a10 A1 Amsterdam Amsterdam5 Amsterdam40 Amsterdamned 112kimn678mn09 112kimm678mn19 112kimm678mn09 112kimm678mn9 865314457646576532325988/); print join "\n", sort { n($a,$b)} @testarray;` [download]	[reply] [d/l]
Re: A new golf challenge - Natural Sorting (OneTruePerlSort) by tye (Sage) on Mar 17, 2006 at 20:42 UTC
It is small (97 strokes, including the meaningful newline), but it is also very fast. There are certainly places where a few characters can be shaved but several of those break subtle features such as one demonstrated in the example code: sorting the sample data correctly even when each record ends with a newline. `use strict; use warnings; sub naturalSort { my$i;s/(\d+)/pack"aNa*",0,length$1,$1/ge,$_.=$".$i++for # 345678 1 2345678 2 2345678 3 2345678 4 2345678 5 2345 my@x=map{lc}@_;@_[map{(split)[-1]}sort@x] #8 6 2345678 7 2345678 8 2345678 9 234567 } my @data= <DATA>; chomp @data if $\; print for naturalSort(@data); __END__ amsTerdam40 amstelveen 1 AmstErdam5 Amsterdam amsterDamed A` [download] Run it normally or with "perl -l ..." to see that it sorts the same with or without the newlines in the strings being sorted. - tye	[reply] [d/l]
Re: A new golf challenge - Natural Sorting by smokemachine (Hermit) on Mar 17, 2006 at 16:14 UTC
`perl -e '$\=$/;$,=" ";@d=@a=sort{lc$a cmp lc$b}@ARGV;for(0..$#a){if($a +[$_]=~/(\d+)/){$b{$1}=$_;push@c,$_}} for(sort{$a<=>$b}keys%b){$d[shif +t@c]=$a[$b{$_}]}print@d' 1 A amstelveen Amsterdam Amsterdam5 Amsterda +m40 Amsterdamned` [download]	[reply] [d/l]
Re^2: A new golf challenge - Natural Sorting by salva (Canon) on Apr 06, 2006 at 22:30 UTC
`$ perl -e 'sub k{for(@p=$_[0]=~/\d+\|[a-z]+/gi){if(/^\d/){s/^0+//;$l=le +ngth;$_="9"x($l/9).$l%9 .$_}}lc join"\0",@p}@s=sort{k($a)cmp k$b}@ARG +V;print"@s\n"' 1 A amstelveen Amsterdam Amsterdam5 Amsterdam40 Amster +damned 1 A amstelveen Amsterdam Amsterdam5 Amsterdam40 Amsterdamned` [download] and numbers are not limited to any range.	[reply] [d/l]
Re: A new golf challenge - Natural Sorting by bart (Canon) on Apr 03, 2006 at 10:15 UTC
I'm not really interested in Perl Golf, hence this late reply, but I'd like to contribute by stating my favourite approach for natural sorting, that apparently nobody has touched in this thread. The basis is this: `split /(\d+)/` [download] This will split the example strings like this: 1 "", 1 A "A" amstelveen "amstelveen" Amsterdam "Amsterdam" Amsterdam5 "Amsterdam", 5 Amsterdam40 "Amsterdam", 40 Amsterdam40b "Amsterdam", 40, "b" Amsterdamned "Amsterdamned" and now for some more complex examples: Chapter 1 Section 3 "Chapter ", 1, "Section ", 3 Chapter 1 Section 10 "Chapter ", 1, "Section ", 10 Now the basic trick to do natural sorting, is partwise comparison of the items, comparing the even numbered items (starting at 0) with alphabetical comparison, case insensitive even if you like, and the odd numbered items with numerical comparison. This is code that does basically that: `sub natcomp { my @a = split /(\d+)/, $a; my @b = split /(\d+)/, $b; my $last = min(scalar @a, scalar @b)-1; my $cmp; for my $i (0 .. $last) { unless($i & 1) { # even $cmp = lc $a[$i] cmp lc $b[$i] \|\| $a[$i] cmp $b[$i] and re +turn $cmp; }else { # odd $cmp = $a[$i] <=> $b[$i] and return $cmp; } } return scalar @a <=> scalar @b; # shortest array comes first }` [download] Let's try this with: `chomp(my @array = <DATA>); $\ = "\n"; print for sort natcomp @array; __DATA__ 1 A amstelveen Amsterdam Amsterdam40 Amsterdam40b Amsterdam5 Amsterdamned Chapter 1 Section 10 Chapter 1 Section 3 Chapter 10 Section 2 Chapter 2 Section 1` [download] (Note that the DATA section is sorted alphabetically, and that it contains an empty string) The result is: 1 A amstelveen Amsterdam Amsterdam5 Amsterdam40 Amsterdam40b Amsterdamned Chapter 1 Section 3 Chapter 1 Section 10 Chapter 2 Section 1 Chapter 10 Section 2 I hope that order is to your liking.	[reply] [d/l] [select]
Re: A new golf challenge - Natural Sorting by GrandFather (Saint) on Mar 17, 2006 at 18:50 UTC
Perhaps OP could provide a test data set as a __DATA__ block and provide sample output illustrating the expected sort order for the given data? It could well be that test strings will be added as new edge cases are described. Perhaps the first line should be: `0 Data set version 1` so that it is obvious in sample output which "rules" any particular solution has been tested against. DWIM is Perl's answer to Gödel	[reply] [d/l]