The three-argument split is just a habit.

I finally got around to benchmarking this and it seems to be a habit you should keep :-)

ok 1 - grtRegex ok 2 - grtSplit ok 3 - grtSplit3 ok 4 - nSubRegex ok 5 - nSubSplit ok 6 - nSubSplit3 ok 7 - stRegex ok 8 - stSplit ok 9 - stSplit3 Rate nSubSplit nSubRegex nSubSplit3 stSplit grtSplit stSp +lit3 stRegex grtRegex grtSplit3 nSubSplit 8.10/s -- -69% -71% -86% -88% +-93% -93% -94% -94% nSubRegex 25.8/s 219% -- -8% -57% -62% +-77% -77% -82% -82% nSubSplit3 28.1/s 247% 9% -- -53% -59% +-75% -75% -80% -80% stSplit 59.8/s 639% 132% 113% -- -12% +-47% -47% -58% -58% grtSplit 68.1/s 741% 164% 142% 14% -- +-39% -39% -52% -52% stSplit3 112/s 1283% 334% 299% 87% 64% + -- -0% -21% -22% stRegex 112/s 1284% 334% 299% 87% 65% + 0% -- -21% -22% grtRegex 143/s 1661% 452% 408% 138% 109% + 27% 27% -- -0% grtSplit3 143/s 1663% 453% 408% 139% 110% + 28% 27% 0% --

Not constraining the split to just the fields you need (given many fields as here, I'm guessing) is a significant performance hit but it seems that the three-argument split is level-pegging with the regular expression approach. The code.

use strict; use warnings; use Benchmark qw{ cmpthese }; use Test::More qw{ no_plan }; my %methods = ( nSubRegex => sub { my @sorted = sort sortRowsRegex @{ $_[ 0 ] }; return \ @sorted; }, nSubSplit => sub { my @sorted = sort sortRowsSplit @{ $_[ 0 ] }; return \ @sorted; }, nSubSplit3 => sub { my @sorted = sort sortRowsSplit3 @{ $_[ 0 ] }; return \ @sorted; }, stRegex => sub { my @sorted = map { $_->[ 0 ] } sort { $a->[ 2 ] <=> $b->[ 2 ] || $a->[ 1 ] <=> $b->[ 1 ] } map { [ $_ , m{^(\d+),(\d+)} ] } @{ $_[ 0 ] }; return \ @sorted; }, stSplit => sub { my @sorted = map { $_->[ 0 ] } sort { $a->[ 1 ] <=> $b->[ 1 ] || $a->[ 2 ] <=> $b->[ 2 ] } map { [ $_ , ( split m{,} )[ 1, 0 ] ] } @{ $_[ 0 ] }; return \ @sorted; }, stSplit3 => sub { my @sorted = map { $_->[ 0 ] } sort { $a->[ 1 ] <=> $b->[ 1 ] || $a->[ 2 ] <=> $b->[ 2 ] } map { [ $_ , ( split m{,}, $_, 3 )[ 1, 0 ] ] } @{ $_[ 0 ] }; return \ @sorted; }, grtRegex => sub { my @sorted = map { substr $_, 8 } sort map { pack q{NNA*}, reverse( m{^(\d+),(\d+)} ), $_ } @{ $_[ 0 ] }; return \ @sorted; }, grtSplit => sub { my @sorted = map { substr $_, 8 } sort map { pack q{NNA*}, ( split m{,} )[ 1, 0 ], $_ } @{ $_[ 0 ] }; return \ @sorted; }, grtSplit3 => sub { my @sorted = map { substr $_, 8 } sort map { pack q{NNA*}, ( split m{,}, $_, 3 )[ 1, 0 ], $_ } @{ $_[ 0 ] }; return \ @sorted; }, ); { no warnings q{qw}; my @test = qw{ 1,64,1.4.0,1.4.6 1,128,1.4.1,1.4.6 1,256,1.4.2,1.4.6 1,512,1.4.3,1.4.6 1,1024,1.4.4,1.4.6 2,64,1.4.5,1.4.6 2,128,1.4.6,1.4.6 2,256,1.4.7,1.4.6 2,512,1.4.8,1.4.6 2,1024,1.4.9,1.4.6 }; my $testRes = join q{:}, qw{ 1,64,1.4.0,1.4.6 2,64,1.4.5,1.4.6 1,128,1.4.1,1.4.6 2,128,1.4.6,1.4.6 1,256,1.4.2,1.4.6 2,256,1.4.7,1.4.6 1,512,1.4.3,1.4.6 2,512,1.4.8,1.4.6 1,1024,1.4.4,1.4.6 2,1024,1.4.9,1.4.6 }; foreach my $method ( sort keys %methods ) { ok( join( q{:}, @{ $methods{ $method }->( \ @test ) } ) eq $testRes, qq{$method} ); } } my @unsorted; for my $col1 ( 1 .. 32 ) { for my $col2 ( 1 .. 32 ) { push @unsorted, join q{,}, $col1, 2 ** $col2, qw{ 1.4.5 1.4.6 44642850 44642850 0 27348 10028 59188 1488095 761904.64 }; } } cmpthese( -5, { map { my $codeStr = q|sub { my $ref = $methods{ | . $_ . q| }->( \ @unsorted ); }|; $_ => eval $codeStr; } keys %methods } ); sub sortRowsRegex { my( $a1, $a2 ) = $a =~ m{^(\d+),(\d+)}; my( $b1, $b2 ) = $b =~ m{^(\d+),(\d+)}; return $a2 <=> $b2 || $a1 <=> $b1; } sub sortRowsSplit { my( $a1, $a2 ) = ( split m{,}, $a )[ 0, 1 ]; my( $b1, $b2 ) = ( split m{,}, $b )[ 0, 1 ]; return $a2 <=> $b2 || $a1 <=> $b1; } sub sortRowsSplit3 { my( $a1, $a2 ) = ( split m{,}, $a, 3 )[ 0, 1 ]; my( $b1, $b2 ) = ( split m{,}, $b, 3 )[ 0, 1 ]; return $a2 <=> $b2 || $a1 <=> $b1; }

Sorry for the slow reply, I hope this is of interest.

Cheers,

JohnGG


In reply to Re^4: numeric sort on substring by johngg
in thread numeric sort on substring by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.