in reply to Re^4: schwartzian transform problem
in thread schwartzian transform problem - Solved

I did try cmpthese but I got error messages about the input file and it wouldn't run. My code:
#!/usr/bin/perl use strict; use warnings; use Benchmark 'cmpthese'; use Time::HiRes qw/tv_interval gettimeofday/; #https://perlmonks.org/index.pl?node_id=11164097 open my $fh, '<', 'try3.txt' or die $!; # try3.txt contains the origin +al data my $s; { local $/ = undef; #local $/ = '', is paragraph mode $s = <$fh>; # slurp file } $s = $s x 10_000; # 90_000 lines (30_000 records) close $fh or die $!; sub GRT { my $s = shift; map { unpack q{x4a*}, $_ } sort map { m{(\d+)(?=%)} && ( ~ pack( q{N}, $1 ) . pack( q{a*}, $_ ) ) } do { #local $/ = q{}; # $s is slurped before passing to split functi +on split m{(?<!\A)(?=>>>)}, $s; } } sub ST { my $s = shift; map {$_->[0]} sort {$b->[1] <=> $a->[1]} map {[$_, /(\d+)%/]} split(/^(?=>>> )/m, $s) } my $t0; my $elapsed; $t0 = [gettimeofday]; GRT($s); $elapsed = tv_interval ( $t0, [gettimeofday]); print "GRT time: ", $elapsed, "\n"; $t0 = [gettimeofday]; ST($s); $elapsed = tv_interval ( $t0, [gettimeofday]); print "ST time: ", $elapsed; =begin C:\Old_Data\perlp>perl GRT_ST_transform.pl GRT time: 0.222224 ST time: 0.05873 C:\Old_Data\perlp> =cut #cmpthese( -1, { 'a' => GRT($s), 'b' => ST($s) } ) ;
The input file was the 9 lines (that I expanded to 90_000 lines for the test):
>>> prd1701 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 887G 3.0T 13% /wor +kspace/data >>> prd1702 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 746G 3.1T 23% /wor +kspace/data >>> prd1703 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 687G 3.2T 18% /wor +kspace/data

Replies are listed 'Best First'.
Re^6: schwartzian transform problem
by choroba (Cardinal) on Mar 23, 2025 at 11:11 UTC
    cmpthese( -1, { 'a' => GRT($s), 'b' => ST($s) } ) ;
    This calls both GRT and ST just once, and uses their output as the code to eval and benchmark. That's not what we want. Instead, we need
    cmpthese(-3, { grt => sub { GRT($s) }, st => sub { ST($s) } });

    It still seems that ST is much faster:

    Rate grt st grt 2.75/s -- -76% st 11.6/s 320% --

    But, notice the regex is different when splitting the string:

    split m{(?<!\A)(?=>>>)}, $s # GRT split (/^(?=>>> )/m, $s) # ST

    So, let's add a test to verify the results stay correct

    use Test::More tests => 1; is GRT($s), ST($s), 'same';

    and we can use the simpler regex from ST in GRT. Fortunately, the test remains successful, and the results are now different:

    1..1 ok 1 - same Rate st grt st 11.7/s -- -3% grt 12.1/s 3% --

    An insignificant difference, but at least ST doesn't seem to be much faster now.

    Benchmarking and optimisation are hard.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Thank you for the correction. I don't use benchmark often if at all. And the difference between the GRT and ST was the difference in the regex for split

        You forgot to include the solution I provided! And let's include a basic use of sort to see what that looks like and how the others compare to that.

        Rate basic GRT ST keysort basic 1.66/s -- -88% -89% -91% GRT 13.3/s 701% -- -12% -25% ST 15.1/s 810% 14% -- -15% keysort 17.7/s 972% 34% 18% -- Rate basic GRT ST keysort basic 1.79/s -- -87% -89% -91% GRT 13.5/s 654% -- -15% -30% ST 15.8/s 786% 18% -- -18% keysort 19.3/s 980% 43% 22% -- Rate basic GRT ST keysort basic 1.70/s -- -87% -89% -91% GRT 13.3/s 681% -- -15% -27% ST 15.6/s 817% 17% -- -14% keysort 18.3/s 972% 37% 17% --

        Sort::Key isn't only the cleanest and simplest of all the solutions (including the builtin sort), it's the fastest! It's 17-22% faster than the next fastest.

        #!/usr/bin/perl use strict; use warnings; use Benchmark qw( cmpthese ); use File::Slurper qw( read_text ); use Sort::Key qw( rikeysort ); my @unsorted = split /^(?=>>> )/m, read_text( "try3.txt" ); @unsorted = ( @unsorted ) x 10_000; # 90_000 lines (30_000 records) sub basic { my @sorted = sort { my ( $an ) = $a =~ /(\d+)%/; my ( $bn ) = $b =~ /(\d+)%/; $bn <=> $an } @unsorted; } sub ST { my @sorted = map $_->[0], sort { $b->[1] <=> $a->[1] } map [ $_, /(\d+)%/ ], @unsorted; } sub GRT { my @sorted = map substr( $_, 4 ), sort map { /(\d+)%/ ? ( ~ pack( "N", $1 ) . $_ ) : () } @unsorted; } sub keysort { my @sorted = rikeysort { ( /(\d+)%/ )[0] } @unsorted; } cmpthese( -3, { basic => \&basic, ST => \&ST, GRT => \&GRT, keysort => \&keysort, } );

        Note: I found substr( $_, 4 ) to be slightly faster than unpack( "xa4", $_ ), thus the change in GRT.