in reply to Re^5: schwartzian transform problem
in thread schwartzian transform problem - Solved

cmpthese( -1, { 'a' => GRT($s), 'b' => ST($s) } ) ;
This calls both GRT and ST just once, and uses their output as the code to eval and benchmark. That's not what we want. Instead, we need
cmpthese(-3, { grt => sub { GRT($s) }, st => sub { ST($s) } });

It still seems that ST is much faster:

Rate grt st grt 2.75/s -- -76% st 11.6/s 320% --

But, notice the regex is different when splitting the string:

split m{(?<!\A)(?=>>>)}, $s # GRT split (/^(?=>>> )/m, $s) # ST

So, let's add a test to verify the results stay correct

use Test::More tests => 1; is GRT($s), ST($s), 'same';

and we can use the simpler regex from ST in GRT. Fortunately, the test remains successful, and the results are now different:

1..1 ok 1 - same Rate st grt st 11.7/s -- -3% grt 12.1/s 3% --

An insignificant difference, but at least ST doesn't seem to be much faster now.

Benchmarking and optimisation are hard.

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Replies are listed 'Best First'.
Re^7: schwartzian transform problem
by Cristoforo (Curate) on Mar 24, 2025 at 16:38 UTC
    Thank you for the correction. I don't use benchmark often if at all. And the difference between the GRT and ST was the difference in the regex for split

      You forgot to include the solution I provided! And let's include a basic use of sort to see what that looks like and how the others compare to that.

      Rate basic GRT ST keysort basic 1.66/s -- -88% -89% -91% GRT 13.3/s 701% -- -12% -25% ST 15.1/s 810% 14% -- -15% keysort 17.7/s 972% 34% 18% -- Rate basic GRT ST keysort basic 1.79/s -- -87% -89% -91% GRT 13.5/s 654% -- -15% -30% ST 15.8/s 786% 18% -- -18% keysort 19.3/s 980% 43% 22% -- Rate basic GRT ST keysort basic 1.70/s -- -87% -89% -91% GRT 13.3/s 681% -- -15% -27% ST 15.6/s 817% 17% -- -14% keysort 18.3/s 972% 37% 17% --

      Sort::Key isn't only the cleanest and simplest of all the solutions (including the builtin sort), it's the fastest! It's 17-22% faster than the next fastest.

      #!/usr/bin/perl use strict; use warnings; use Benchmark qw( cmpthese ); use File::Slurper qw( read_text ); use Sort::Key qw( rikeysort ); my @unsorted = split /^(?=>>> )/m, read_text( "try3.txt" ); @unsorted = ( @unsorted ) x 10_000; # 90_000 lines (30_000 records) sub basic { my @sorted = sort { my ( $an ) = $a =~ /(\d+)%/; my ( $bn ) = $b =~ /(\d+)%/; $bn <=> $an } @unsorted; } sub ST { my @sorted = map $_->[0], sort { $b->[1] <=> $a->[1] } map [ $_, /(\d+)%/ ], @unsorted; } sub GRT { my @sorted = map substr( $_, 4 ), sort map { /(\d+)%/ ? ( ~ pack( "N", $1 ) . $_ ) : () } @unsorted; } sub keysort { my @sorted = rikeysort { ( /(\d+)%/ )[0] } @unsorted; } cmpthese( -3, { basic => \&basic, ST => \&ST, GRT => \&GRT, keysort => \&keysort, } );

      Note: I found substr( $_, 4 ) to be slightly faster than unpack( "xa4", $_ ), thus the change in GRT.

        Regexes ARE slow (of course everyone here knows). To the extent I can trust my Strawberries and/or my crippled Kaby Lake to benchmark anything, the next is twice as fast (+ I guess rnkeysort should have been used (and it's slower)):

        sub keysort1 { my @sorted = rnkeysort { substr $_, rindex( $_, '%' ) - 3, 3 } @uns +orted; }

        And assuming $s keeps the (un|pre)split input, and if each 2nd '%' is guaranteed to be an anchor, even if records have different length/layout, then the next unclean ugly one is still faster than keysort for me:

        sub test1 { my $i = my $j = 0; my @nums; $j ^= 1 or push @nums, substr $s, $i - 3, 3 while -1 != ( $i = index $s, '%', $i + 1 ); my @sorted = @unsorted [ sort { $nums[$b] <=> $nums[$a] } 0 .. $#nums ] } __END__ Rate keysort test1 keysort 21.6/s -- -30% test1 30.6/s 42% --