in reply to Re^6: schwartzian transform problem
in thread schwartzian transform problem - Solved

Thank you for the correction. I don't use benchmark often if at all. And the difference between the GRT and ST was the difference in the regex for split

Replies are listed 'Best First'.
Re^8: schwartzian transform problem
by ikegami (Patriarch) on Mar 25, 2025 at 23:02 UTC

    You forgot to include the solution I provided! And let's include a basic use of sort to see what that looks like and how the others compare to that.

    Rate basic GRT ST keysort basic 1.66/s -- -88% -89% -91% GRT 13.3/s 701% -- -12% -25% ST 15.1/s 810% 14% -- -15% keysort 17.7/s 972% 34% 18% -- Rate basic GRT ST keysort basic 1.79/s -- -87% -89% -91% GRT 13.5/s 654% -- -15% -30% ST 15.8/s 786% 18% -- -18% keysort 19.3/s 980% 43% 22% -- Rate basic GRT ST keysort basic 1.70/s -- -87% -89% -91% GRT 13.3/s 681% -- -15% -27% ST 15.6/s 817% 17% -- -14% keysort 18.3/s 972% 37% 17% --

    Sort::Key isn't only the cleanest and simplest of all the solutions (including the builtin sort), it's the fastest! It's 17-22% faster than the next fastest.

    #!/usr/bin/perl use strict; use warnings; use Benchmark qw( cmpthese ); use File::Slurper qw( read_text ); use Sort::Key qw( rikeysort ); my @unsorted = split /^(?=>>> )/m, read_text( "try3.txt" ); @unsorted = ( @unsorted ) x 10_000; # 90_000 lines (30_000 records) sub basic { my @sorted = sort { my ( $an ) = $a =~ /(\d+)%/; my ( $bn ) = $b =~ /(\d+)%/; $bn <=> $an } @unsorted; } sub ST { my @sorted = map $_->[0], sort { $b->[1] <=> $a->[1] } map [ $_, /(\d+)%/ ], @unsorted; } sub GRT { my @sorted = map substr( $_, 4 ), sort map { /(\d+)%/ ? ( ~ pack( "N", $1 ) . $_ ) : () } @unsorted; } sub keysort { my @sorted = rikeysort { ( /(\d+)%/ )[0] } @unsorted; } cmpthese( -3, { basic => \&basic, ST => \&ST, GRT => \&GRT, keysort => \&keysort, } );

    Note: I found substr( $_, 4 ) to be slightly faster than unpack( "xa4", $_ ), thus the change in GRT.

      Regexes ARE slow (of course everyone here knows). To the extent I can trust my Strawberries and/or my crippled Kaby Lake to benchmark anything, the next is twice as fast (+ I guess rnkeysort should have been used (and it's slower)):

      sub keysort1 { my @sorted = rnkeysort { substr $_, rindex( $_, '%' ) - 3, 3 } @uns +orted; }

      And assuming $s keeps the (un|pre)split input, and if each 2nd '%' is guaranteed to be an anchor, even if records have different length/layout, then the next unclean ugly one is still faster than keysort for me:

      sub test1 { my $i = my $j = 0; my @nums; $j ^= 1 or push @nums, substr $s, $i - 3, 3 while -1 != ( $i = index $s, '%', $i + 1 ); my @sorted = @unsorted [ sort { $nums[$b] <=> $nums[$a] } 0 .. $#nums ] } __END__ Rate keysort test1 keysort 21.6/s -- -30% test1 30.6/s 42% --