Anyway, it turned out to be a matter of using ST
I don't think you need a Schwartzian Transform here. An ST makes sense if the individual comparison operation is computationally expensive. This is not the case with interpreting a string as a number, in particular as the conversion is done only once for each string and then "cached" in the NV/IV fields of the scalar variable(*). In other words, the simple approach (not using ST) is even faster in this case:
#!/usr/bin/perl use strict; use warnings; no warnings 'numeric'; use Benchmark 'cmpthese'; for my $e (2..5) { my $n = 10**$e; print "\nNumber of file names: $n\n"; my @data; push @data, join(".", int(rand($n)), int(rand($n)), 'force.0.5.1LG +Y.pdb') for 1..$n; cmpthese( 10**(6-$e), { 'simple' => sub { my @unsorted = @data; my @sorted = sort { $a <=> $b } @unsorted; }, 'ST' => sub { my @unsorted = @data; my @sorted = map $_->[0], sort { $a->[1] <=> $b->[1] } map { [ $_, int $_ ] } @unsorted; }, } ); } __END__ Number of file names: 100 Rate ST simple ST 3247/s -- -75% simple 12987/s 300% -- Number of file names: 1000 Rate ST simple ST 248/s -- -79% simple 1176/s 375% -- Number of file names: 10000 Rate ST simple ST 10.3/s -- -74% simple 39.2/s 280% -- Number of file names: 100000 s/iter ST simple ST 1.87 -- -50% simple 0.943 99% --
Another beneficial side effect of the simple approach is that if you happen to have two names like this
30.31.force.0.5.1LGY.pdb 30.32.force.0.5.1LGY.pdb
they would be ordered in some useful way, because the fractional part of the number is automatically taken into consideration when just treating the name as a number.
(*)
use Devel::Peek; my $s = "30.31.force.0.5.1LGY.pdb"; Dump $s; print 0+$s, "\n"; # treat as number Dump $s; __END__ SV = PV(0x605150) at 0x604fa0 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x6370d0 "30.31.force.0.5.1LGY.pdb"\0 CUR = 24 LEN = 32 30.31 SV = PVNV(0x607880) at 0x604fa0 REFCNT = 1 FLAGS = (PADBUSY,PADMY,NOK,POK,pIOK,pNOK,pPOK) IV = 30 <--- NV = 30.31 <--- PV = 0x6370d0 "30.31.force.0.5.1LGY.pdb"\0 CUR = 24 LEN = 32
In reply to Re^3: sorting an array of file names
by almut
in thread sorting an array of file names
by hotel
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |