in reply to Re^2: schwartzian transform problem
in thread schwartzian transform problem - Solved

I ran tests on the GRT and ST on 90_000 lines and they both ran in less than one second. The GRT ran in 1/5 second and the ST ran in 1/15 second

Replies are listed 'Best First'.
Re^4: schwartzian transform problem
by choroba (Cardinal) on Mar 21, 2025 at 19:08 UTC
    Given the parent node of your node, this might sound repetitive, but that's exactly why modules like Benchmark exist.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      I did try cmpthese but I got error messages about the input file and it wouldn't run. My code:
      #!/usr/bin/perl use strict; use warnings; use Benchmark 'cmpthese'; use Time::HiRes qw/tv_interval gettimeofday/; #https://perlmonks.org/index.pl?node_id=11164097 open my $fh, '<', 'try3.txt' or die $!; # try3.txt contains the origin +al data my $s; { local $/ = undef; #local $/ = '', is paragraph mode $s = <$fh>; # slurp file } $s = $s x 10_000; # 90_000 lines (30_000 records) close $fh or die $!; sub GRT { my $s = shift; map { unpack q{x4a*}, $_ } sort map { m{(\d+)(?=%)} && ( ~ pack( q{N}, $1 ) . pack( q{a*}, $_ ) ) } do { #local $/ = q{}; # $s is slurped before passing to split functi +on split m{(?<!\A)(?=>>>)}, $s; } } sub ST { my $s = shift; map {$_->[0]} sort {$b->[1] <=> $a->[1]} map {[$_, /(\d+)%/]} split(/^(?=>>> )/m, $s) } my $t0; my $elapsed; $t0 = [gettimeofday]; GRT($s); $elapsed = tv_interval ( $t0, [gettimeofday]); print "GRT time: ", $elapsed, "\n"; $t0 = [gettimeofday]; ST($s); $elapsed = tv_interval ( $t0, [gettimeofday]); print "ST time: ", $elapsed; =begin C:\Old_Data\perlp>perl GRT_ST_transform.pl GRT time: 0.222224 ST time: 0.05873 C:\Old_Data\perlp> =cut #cmpthese( -1, { 'a' => GRT($s), 'b' => ST($s) } ) ;
      The input file was the 9 lines (that I expanded to 90_000 lines for the test):
      >>> prd1701 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 887G 3.0T 13% /wor +kspace/data >>> prd1702 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 746G 3.1T 23% /wor +kspace/data >>> prd1703 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 687G 3.2T 18% /wor +kspace/data
        cmpthese( -1, { 'a' => GRT($s), 'b' => ST($s) } ) ;
        This calls both GRT and ST just once, and uses their output as the code to eval and benchmark. That's not what we want. Instead, we need
        cmpthese(-3, { grt => sub { GRT($s) }, st => sub { ST($s) } });

        It still seems that ST is much faster:

        Rate grt st grt 2.75/s -- -76% st 11.6/s 320% --

        But, notice the regex is different when splitting the string:

        split m{(?<!\A)(?=>>>)}, $s # GRT split (/^(?=>>> )/m, $s) # ST

        So, let's add a test to verify the results stay correct

        use Test::More tests => 1; is GRT($s), ST($s), 'same';

        and we can use the simpler regex from ST in GRT. Fortunately, the test remains successful, and the results are now different:

        1..1 ok 1 - same Rate st grt st 11.7/s -- -3% grt 12.1/s 3% --

        An insignificant difference, but at least ST doesn't seem to be much faster now.

        Benchmarking and optimisation are hard.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]