in reply to Re^3: schwartzian transform problem
in thread schwartzian transform problem - Solved

Given the parent node of your node, this might sound repetitive, but that's exactly why modules like Benchmark exist.

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Replies are listed 'Best First'.
Re^5: schwartzian transform problem
by Cristoforo (Curate) on Mar 22, 2025 at 14:36 UTC
    I did try cmpthese but I got error messages about the input file and it wouldn't run. My code:
    #!/usr/bin/perl use strict; use warnings; use Benchmark 'cmpthese'; use Time::HiRes qw/tv_interval gettimeofday/; #https://perlmonks.org/index.pl?node_id=11164097 open my $fh, '<', 'try3.txt' or die $!; # try3.txt contains the origin +al data my $s; { local $/ = undef; #local $/ = '', is paragraph mode $s = <$fh>; # slurp file } $s = $s x 10_000; # 90_000 lines (30_000 records) close $fh or die $!; sub GRT { my $s = shift; map { unpack q{x4a*}, $_ } sort map { m{(\d+)(?=%)} && ( ~ pack( q{N}, $1 ) . pack( q{a*}, $_ ) ) } do { #local $/ = q{}; # $s is slurped before passing to split functi +on split m{(?<!\A)(?=>>>)}, $s; } } sub ST { my $s = shift; map {$_->[0]} sort {$b->[1] <=> $a->[1]} map {[$_, /(\d+)%/]} split(/^(?=>>> )/m, $s) } my $t0; my $elapsed; $t0 = [gettimeofday]; GRT($s); $elapsed = tv_interval ( $t0, [gettimeofday]); print "GRT time: ", $elapsed, "\n"; $t0 = [gettimeofday]; ST($s); $elapsed = tv_interval ( $t0, [gettimeofday]); print "ST time: ", $elapsed; =begin C:\Old_Data\perlp>perl GRT_ST_transform.pl GRT time: 0.222224 ST time: 0.05873 C:\Old_Data\perlp> =cut #cmpthese( -1, { 'a' => GRT($s), 'b' => ST($s) } ) ;
    The input file was the 9 lines (that I expanded to 90_000 lines for the test):
    >>> prd1701 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 887G 3.0T 13% /wor +kspace/data >>> prd1702 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 746G 3.1T 23% /wor +kspace/data >>> prd1703 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 687G 3.2T 18% /wor +kspace/data
      cmpthese( -1, { 'a' => GRT($s), 'b' => ST($s) } ) ;
      This calls both GRT and ST just once, and uses their output as the code to eval and benchmark. That's not what we want. Instead, we need
      cmpthese(-3, { grt => sub { GRT($s) }, st => sub { ST($s) } });

      It still seems that ST is much faster:

      Rate grt st grt 2.75/s -- -76% st 11.6/s 320% --

      But, notice the regex is different when splitting the string:

      split m{(?<!\A)(?=>>>)}, $s # GRT split (/^(?=>>> )/m, $s) # ST

      So, let's add a test to verify the results stay correct

      use Test::More tests => 1; is GRT($s), ST($s), 'same';

      and we can use the simpler regex from ST in GRT. Fortunately, the test remains successful, and the results are now different:

      1..1 ok 1 - same Rate st grt st 11.7/s -- -3% grt 12.1/s 3% --

      An insignificant difference, but at least ST doesn't seem to be much faster now.

      Benchmarking and optimisation are hard.

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        Thank you for the correction. I don't use benchmark often if at all. And the difference between the GRT and ST was the difference in the regex for split