Re^5: schwartzian transform problem

I did try cmpthese but I got error messages about the input file and it wouldn't run. My code:

#!/usr/bin/perl
use strict;
use warnings;
use Benchmark 'cmpthese';
use Time::HiRes qw/tv_interval gettimeofday/;

#https://perlmonks.org/index.pl?node_id=11164097

open my $fh, '<', 'try3.txt' or die $!; # try3.txt contains the origin
+al data

my $s;

{
    local $/ = undef; #local $/ = '', is paragraph mode
    $s = <$fh>;    # slurp file
}

$s = $s x 10_000; # 90_000 lines (30_000 records)

close $fh or die $!;


sub GRT {
    my $s = shift;
    map { unpack q{x4a*}, $_ }
   sort
   map {
       m{(\d+)(?=%)}
       &&
       ( ~ pack( q{N}, $1 ) . pack( q{a*}, $_ ) )
   }
   do  {
       #local $/ = q{}; # $s is slurped before passing to split functi
+on
       split m{(?<!\A)(?=>>>)}, $s;
   }    
}

sub ST {
    my $s = shift;
    map {$_->[0]}
    sort {$b->[1] <=> $a->[1]}
    map {[$_, /(\d+)%/]} split(/^(?=>>> )/m, $s)
}

my $t0;
my $elapsed;

$t0 = [gettimeofday];
GRT($s);
$elapsed = tv_interval ( $t0, [gettimeofday]);

print "GRT time: ", $elapsed, "\n";

$t0 = [gettimeofday];
ST($s);
$elapsed = tv_interval ( $t0, [gettimeofday]);

print "ST time:  ", $elapsed;

=begin
C:\Old_Data\perlp>perl GRT_ST_transform.pl
GRT time: 0.222224
ST time:  0.05873
C:\Old_Data\perlp>
=cut

#cmpthese( -1, { 'a' => GRT($s), 'b' => ST($s) } ) ;
[download]

The input file was the 9 lines (that I expanded to 90_000 lines for the test):

>>> prd1701
Filesystem                                  Size  Used Avail Use% Moun
+ted on
/workspace                                  3.9T  887G  3.0T  13% /wor
+kspace/data
>>> prd1702
Filesystem                                  Size  Used Avail Use% Moun
+ted on
/workspace                                  3.9T  746G  3.1T  23% /wor
+kspace/data
>>> prd1703
Filesystem                                  Size  Used Avail Use% Moun
+ted on
/workspace                                  3.9T  687G  3.2T  18% /wor
+kspace/data
[download]

Comment on Re^5: schwartzian transform problem Select or Download Code

Replies are listed 'Best First'.
Re^6: schwartzian transform problem by choroba (Cardinal) on Mar 23, 2025 at 11:11 UTC
`cmpthese( -1, { 'a' => GRT($s), 'b' => ST($s) } ) ;` [download] This calls both GRT and ST just once, and uses their output as the code to eval and benchmark. That's not what we want. Instead, we need `cmpthese(-3, { grt => sub { GRT($s) }, st => sub { ST($s) } });` [download] It still seems that ST is much faster: `Rate grt st grt 2.75/s -- -76% st 11.6/s 320% --` [download] But, notice the regex is different when splitting the string: `split m{(?<!\A)(?=>>>)}, $s # GRT split (/^(?=>>> )/m, $s) # ST` [download] So, let's add a test to verify the results stay correct `use Test::More tests => 1; is GRT($s), ST($s), 'same';` [download] and we can use the simpler regex from ST in GRT. Fortunately, the test remains successful, and the results are now different: `1..1 ok 1 - same Rate st grt st 11.7/s -- -3% grt 12.1/s 3% --` [download] An insignificant difference, but at least ST doesn't seem to be much faster now. Benchmarking and optimisation are hard. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^7: schwartzian transform problem by Cristoforo (Curate) on Mar 24, 2025 at 16:38 UTC
Thank you for the correction. I don't use benchmark often if at all. And the difference between the GRT and ST was the difference in the regex for `split`	[reply] [d/l]
Re^8: schwartzian transform problem by ikegami (Patriarch) on Mar 25, 2025 at 23:02 UTC
You forgot to include the solution I provided! And let's include a basic use of `sort` to see what that looks like and how the others compare to that. `Rate basic GRT ST keysort basic 1.66/s -- -88% -89% -91% GRT 13.3/s 701% -- -12% -25% ST 15.1/s 810% 14% -- -15% keysort 17.7/s 972% 34% 18% -- Rate basic GRT ST keysort basic 1.79/s -- -87% -89% -91% GRT 13.5/s 654% -- -15% -30% ST 15.8/s 786% 18% -- -18% keysort 19.3/s 980% 43% 22% -- Rate basic GRT ST keysort basic 1.70/s -- -87% -89% -91% GRT 13.3/s 681% -- -15% -27% ST 15.6/s 817% 17% -- -14% keysort 18.3/s 972% 37% 17% --` [download] Sort::Key isn't only the cleanest and simplest of all the solutions (including the builtin `sort`), it's the fastest! It's 17-22% faster than the next fastest. #!/usr/bin/perl use strict; use warnings; use Benchmark qw( cmpthese ); use File::Slurper qw( read_text ); use Sort::Key qw( rikeysort ); my @unsorted = split /^(?=>>> )/m, read_text( "try3.txt" ); @unsorted = ( @unsorted ) x 10_000; # 90_000 lines (30_000 records) sub basic { my @sorted = sort { my ( $an ) = $a =~ /(\d+)%/; my ( $bn ) = $b =~ /(\d+)%/; $bn <=> $an } @unsorted; } sub ST { my @sorted = map $_->[0], sort { $b->[1] <=> $a->[1] } map [ $_, /(\d+)%/ ], @unsorted; } sub GRT { my @sorted = map substr( $_, 4 ), sort map { /(\d+)%/ ? ( ~ pack( "N", $1 ) . $_ ) : () } @unsorted; } sub keysort { my @sorted = rikeysort { ( /(\d+)%/ )[0] } @unsorted; } cmpthese( -3, { basic => \&basic, ST => \&ST, GRT => \&GRT, keysort => \&keysort, } ); [download] Note: I found `substr( $_, 4 )` to be slightly faster than `unpack( "xa4", $_ )`, thus the change in GRT.	[reply] [d/l] [select]
Re^9: schwartzian transform problem by Anonymous Monk on Mar 26, 2025 at 22:21 UTC