in reply to Re^3: numeric sort on substring
in thread numeric sort on substring

The three-argument split is just a habit.

I finally got around to benchmarking this and it seems to be a habit you should keep :-)

ok 1 - grtRegex ok 2 - grtSplit ok 3 - grtSplit3 ok 4 - nSubRegex ok 5 - nSubSplit ok 6 - nSubSplit3 ok 7 - stRegex ok 8 - stSplit ok 9 - stSplit3 Rate nSubSplit nSubRegex nSubSplit3 stSplit grtSplit stSp +lit3 stRegex grtRegex grtSplit3 nSubSplit 8.10/s -- -69% -71% -86% -88% +-93% -93% -94% -94% nSubRegex 25.8/s 219% -- -8% -57% -62% +-77% -77% -82% -82% nSubSplit3 28.1/s 247% 9% -- -53% -59% +-75% -75% -80% -80% stSplit 59.8/s 639% 132% 113% -- -12% +-47% -47% -58% -58% grtSplit 68.1/s 741% 164% 142% 14% -- +-39% -39% -52% -52% stSplit3 112/s 1283% 334% 299% 87% 64% + -- -0% -21% -22% stRegex 112/s 1284% 334% 299% 87% 65% + 0% -- -21% -22% grtRegex 143/s 1661% 452% 408% 138% 109% + 27% 27% -- -0% grtSplit3 143/s 1663% 453% 408% 139% 110% + 28% 27% 0% --

Not constraining the split to just the fields you need (given many fields as here, I'm guessing) is a significant performance hit but it seems that the three-argument split is level-pegging with the regular expression approach. The code.

use strict; use warnings; use Benchmark qw{ cmpthese }; use Test::More qw{ no_plan }; my %methods = ( nSubRegex => sub { my @sorted = sort sortRowsRegex @{ $_[ 0 ] }; return \ @sorted; }, nSubSplit => sub { my @sorted = sort sortRowsSplit @{ $_[ 0 ] }; return \ @sorted; }, nSubSplit3 => sub { my @sorted = sort sortRowsSplit3 @{ $_[ 0 ] }; return \ @sorted; }, stRegex => sub { my @sorted = map { $_->[ 0 ] } sort { $a->[ 2 ] <=> $b->[ 2 ] || $a->[ 1 ] <=> $b->[ 1 ] } map { [ $_ , m{^(\d+),(\d+)} ] } @{ $_[ 0 ] }; return \ @sorted; }, stSplit => sub { my @sorted = map { $_->[ 0 ] } sort { $a->[ 1 ] <=> $b->[ 1 ] || $a->[ 2 ] <=> $b->[ 2 ] } map { [ $_ , ( split m{,} )[ 1, 0 ] ] } @{ $_[ 0 ] }; return \ @sorted; }, stSplit3 => sub { my @sorted = map { $_->[ 0 ] } sort { $a->[ 1 ] <=> $b->[ 1 ] || $a->[ 2 ] <=> $b->[ 2 ] } map { [ $_ , ( split m{,}, $_, 3 )[ 1, 0 ] ] } @{ $_[ 0 ] }; return \ @sorted; }, grtRegex => sub { my @sorted = map { substr $_, 8 } sort map { pack q{NNA*}, reverse( m{^(\d+),(\d+)} ), $_ } @{ $_[ 0 ] }; return \ @sorted; }, grtSplit => sub { my @sorted = map { substr $_, 8 } sort map { pack q{NNA*}, ( split m{,} )[ 1, 0 ], $_ } @{ $_[ 0 ] }; return \ @sorted; }, grtSplit3 => sub { my @sorted = map { substr $_, 8 } sort map { pack q{NNA*}, ( split m{,}, $_, 3 )[ 1, 0 ], $_ } @{ $_[ 0 ] }; return \ @sorted; }, ); { no warnings q{qw}; my @test = qw{ 1,64,1.4.0,1.4.6 1,128,1.4.1,1.4.6 1,256,1.4.2,1.4.6 1,512,1.4.3,1.4.6 1,1024,1.4.4,1.4.6 2,64,1.4.5,1.4.6 2,128,1.4.6,1.4.6 2,256,1.4.7,1.4.6 2,512,1.4.8,1.4.6 2,1024,1.4.9,1.4.6 }; my $testRes = join q{:}, qw{ 1,64,1.4.0,1.4.6 2,64,1.4.5,1.4.6 1,128,1.4.1,1.4.6 2,128,1.4.6,1.4.6 1,256,1.4.2,1.4.6 2,256,1.4.7,1.4.6 1,512,1.4.3,1.4.6 2,512,1.4.8,1.4.6 1,1024,1.4.4,1.4.6 2,1024,1.4.9,1.4.6 }; foreach my $method ( sort keys %methods ) { ok( join( q{:}, @{ $methods{ $method }->( \ @test ) } ) eq $testRes, qq{$method} ); } } my @unsorted; for my $col1 ( 1 .. 32 ) { for my $col2 ( 1 .. 32 ) { push @unsorted, join q{,}, $col1, 2 ** $col2, qw{ 1.4.5 1.4.6 44642850 44642850 0 27348 10028 59188 1488095 761904.64 }; } } cmpthese( -5, { map { my $codeStr = q|sub { my $ref = $methods{ | . $_ . q| }->( \ @unsorted ); }|; $_ => eval $codeStr; } keys %methods } ); sub sortRowsRegex { my( $a1, $a2 ) = $a =~ m{^(\d+),(\d+)}; my( $b1, $b2 ) = $b =~ m{^(\d+),(\d+)}; return $a2 <=> $b2 || $a1 <=> $b1; } sub sortRowsSplit { my( $a1, $a2 ) = ( split m{,}, $a )[ 0, 1 ]; my( $b1, $b2 ) = ( split m{,}, $b )[ 0, 1 ]; return $a2 <=> $b2 || $a1 <=> $b1; } sub sortRowsSplit3 { my( $a1, $a2 ) = ( split m{,}, $a, 3 )[ 0, 1 ]; my( $b1, $b2 ) = ( split m{,}, $b, 3 )[ 0, 1 ]; return $a2 <=> $b2 || $a1 <=> $b1; }

Sorry for the slow reply, I hope this is of interest.

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^5: numeric sort on substring
by salva (Canon) on Jan 27, 2011 at 14:48 UTC
    It you want to sort anything in Perl fast, then go for Sort::Key!!!
    use Sort::Key::Multi qw(ii_keysort); ... my %methods = ( ... skm => sub { my @sorted = ii_keysort { (split m{,}, $_, 3 )[ 1, 0 ] } @{$_[0 +]}; return \@sorted } );
    That's what I get on my computer:
    Rate nSubSplit nSubRegex nSubSplit3 stSplit stSplit3 stRe +gex grtSplit grtRegex grtSplit3 skm nSubSplit 21.7/s -- -54% -61% -75% -83% - +84% -85% -91% -91% -95% nSubRegex 47.6/s 119% -- -14% -46% -63% - +65% -67% -80% -81% -89% nSubSplit3 55.6/s 156% 17% -- -36% -56% - +60% -62% -77% -77% -87% stSplit 87.5/s 303% 84% 57% -- -31% - +36% -40% -64% -64% -80% stSplit3 127/s 486% 167% 129% 45% -- +-8% -12% -48% -48% -70% stRegex 138/s 534% 189% 147% 57% 8% + -- -5% -43% -44% -68% grtSplit 145/s 569% 205% 161% 66% 14% + 6% -- -40% -41% -66% grtRegex 243/s 1021% 411% 338% 178% 91% +77% 68% -- -1% -43% grtSplit3 245/s 1029% 415% 341% 180% 93% +78% 69% 1% -- -43% skm 431/s 1885% 805% 674% 393% 239% 2 +13% 197% 77% 76% --