comment on

The three-argument split is just a habit.

I finally got around to benchmarking this and it seems to be a habit you should keep :-)

ok 1 - grtRegex
ok 2 - grtSplit
ok 3 - grtSplit3
ok 4 - nSubRegex
ok 5 - nSubSplit
ok 6 - nSubSplit3
ok 7 - stRegex
ok 8 - stSplit
ok 9 - stSplit3
             Rate nSubSplit nSubRegex nSubSplit3 stSplit grtSplit stSp
+lit3 stRegex grtRegex grtSplit3
nSubSplit  8.10/s        --      -69%       -71%    -86%     -88%     
+-93%    -93%     -94%      -94%
nSubRegex  25.8/s      219%        --        -8%    -57%     -62%     
+-77%    -77%     -82%      -82%
nSubSplit3 28.1/s      247%        9%         --    -53%     -59%     
+-75%    -75%     -80%      -80%
stSplit    59.8/s      639%      132%       113%      --     -12%     
+-47%    -47%     -58%      -58%
grtSplit   68.1/s      741%      164%       142%     14%       --     
+-39%    -39%     -52%      -52%
stSplit3    112/s     1283%      334%       299%     87%      64%     
+  --     -0%     -21%      -22%
stRegex     112/s     1284%      334%       299%     87%      65%     
+  0%      --     -21%      -22%
grtRegex    143/s     1661%      452%       408%    138%     109%     
+ 27%     27%       --       -0%
grtSplit3   143/s     1663%      453%       408%    139%     110%     
+ 28%     27%       0%        --
[download]

Not constraining the split to just the fields you need (given many fields as here, I'm guessing) is a significant performance hit but it seems that the three-argument split is level-pegging with the regular expression approach. The code.

use strict;
use warnings;

use Benchmark  qw{ cmpthese };
use Test::More qw{ no_plan };

my %methods = (
   nSubRegex   => sub
   {
       my @sorted = sort sortRowsRegex @{ $_[ 0 ] };
       return \ @sorted;
   },
   nSubSplit   => sub
   {
       my @sorted = sort sortRowsSplit @{ $_[ 0 ] };
       return \ @sorted;
   },
   nSubSplit3  => sub
   {
       my @sorted = sort sortRowsSplit3 @{ $_[ 0 ] };
       return \ @sorted;
   },
   stRegex   => sub
   {
       my @sorted =
          map  { $_->[ 0 ] }
          sort { $a->[ 2 ] <=> $b->[ 2 ] || $a->[ 1 ] <=> $b->[ 1 ] }
          map  { [ $_ , m{^(\d+),(\d+)} ] }
          @{ $_[ 0 ] };
       return \ @sorted;
   },
   stSplit   => sub
   {
       my @sorted =
          map  { $_->[ 0 ] }
          sort { $a->[ 1 ] <=> $b->[ 1 ] || $a->[ 2 ] <=> $b->[ 2 ] }
          map  { [ $_ , ( split m{,} )[ 1, 0 ] ] }
          @{ $_[ 0 ] };
       return \ @sorted;
   },
   stSplit3  => sub
   {
       my @sorted =
          map  { $_->[ 0 ] }
          sort { $a->[ 1 ] <=> $b->[ 1 ] || $a->[ 2 ] <=> $b->[ 2 ] }
          map  { [ $_ , ( split m{,}, $_, 3 )[ 1, 0 ] ] }
          @{ $_[ 0 ] };
       return \ @sorted;
   },
   grtRegex  => sub
   {
       my @sorted =
          map  { substr $_, 8 }
          sort
          map  { pack q{NNA*}, reverse( m{^(\d+),(\d+)} ),  $_ }
          @{ $_[ 0 ] };
       return \ @sorted;
   },
   grtSplit  => sub
   {
       my @sorted =
          map  { substr $_, 8 }
          sort
          map  { pack q{NNA*}, ( split m{,} )[ 1, 0 ], $_ }
          @{ $_[ 0 ] };
       return \ @sorted;
   },
   grtSplit3 => sub
   {
       my @sorted =
          map  { substr $_, 8 }
          sort
          map  { pack q{NNA*}, ( split m{,}, $_, 3 )[ 1, 0 ], $_ }
          @{ $_[ 0 ] };
       return \ @sorted;
   },
   );

{
    no warnings q{qw};

    my @test = qw{
       1,64,1.4.0,1.4.6
       1,128,1.4.1,1.4.6
       1,256,1.4.2,1.4.6
       1,512,1.4.3,1.4.6
       1,1024,1.4.4,1.4.6
       2,64,1.4.5,1.4.6
       2,128,1.4.6,1.4.6
       2,256,1.4.7,1.4.6
       2,512,1.4.8,1.4.6
       2,1024,1.4.9,1.4.6
       };
    my $testRes = join q{:}, qw{
       1,64,1.4.0,1.4.6
       2,64,1.4.5,1.4.6
       1,128,1.4.1,1.4.6
       2,128,1.4.6,1.4.6
       1,256,1.4.2,1.4.6
       2,256,1.4.7,1.4.6
       1,512,1.4.3,1.4.6
       2,512,1.4.8,1.4.6
       1,1024,1.4.4,1.4.6
       2,1024,1.4.9,1.4.6
       };

    foreach my $method ( sort keys %methods )
    {
        ok(
           join( q{:}, @{ $methods{ $method }->( \ @test ) } )
           eq
           $testRes,
           qq{$method}
           );
    }
}

my @unsorted;

for my $col1 ( 1 .. 32 )
{
    for my $col2 ( 1 .. 32 )
    {
        push @unsorted,
          join q{,},
             $col1,
             2 ** $col2,
             qw{
                1.4.5
                1.4.6
                44642850
                44642850
                0
                27348
                10028
                59188
                1488095
                761904.64
                };
    }
}

cmpthese(
   -5,
   {
       map
       {
           my $codeStr
              = q|sub { my $ref = $methods{ |
              . $_
              . q| }->( \ @unsorted ); }|;
           $_ => eval $codeStr;
       }
       keys %methods
   }
   );

sub sortRowsRegex
{
    my( $a1, $a2 ) = $a =~ m{^(\d+),(\d+)};
    my( $b1, $b2 ) = $b =~ m{^(\d+),(\d+)};
    return $a2 <=> $b2 || $a1 <=> $b1;
}

sub sortRowsSplit
{
    my( $a1, $a2 ) = ( split m{,}, $a )[ 0, 1 ];
    my( $b1, $b2 ) = ( split m{,}, $b )[ 0, 1 ];
    return $a2 <=> $b2 || $a1 <=> $b1;
}

sub sortRowsSplit3
{
    my( $a1, $a2 ) = ( split m{,}, $a, 3 )[ 0, 1 ];
    my( $b1, $b2 ) = ( split m{,}, $b, 3 )[ 0, 1 ];
    return $a2 <=> $b2 || $a1 <=> $b1;
}
[download]

Sorry for the slow reply, I hope this is of interest.

Cheers,

JohnGG

In reply to Re^4: numeric sort on substring by johngg
in thread numeric sort on substring by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.