Re^4: schwartzian transform problem

Replies are listed 'Best First'.
Re^5: schwartzian transform problem by Cristoforo (Curate) on Mar 22, 2025 at 14:36 UTC
I did try `cmpthese` but I got error messages about the input file and it wouldn't run. My code: #!/usr/bin/perl use strict; use warnings; use Benchmark 'cmpthese'; use Time::HiRes qw/tv_interval gettimeofday/; #https://perlmonks.org/index.pl?node_id=11164097 open my $fh, '<', 'try3.txt' or die $!; # try3.txt contains the origin +al data my $s; { local $/ = undef; #local $/ = '', is paragraph mode $s = <$fh>; # slurp file } $s = $s x 10_000; # 90_000 lines (30_000 records) close $fh or die $!; sub GRT { my $s = shift; map { unpack q{x4a}, $_ } sort map { m{(\d+)(?=%)} && ( ~ pack( q{N}, $1 ) . pack( q{a}, $_ ) ) } do { #local $/ = q{}; # $s is slurped before passing to split functi +on split m{(?<!\A)(?=>>>)}, $s; } } sub ST { my $s = shift; map {$_->[0]} sort {$b->[1] <=> $a->[1]} map {[$_, /(\d+)%/]} split(/^(?=>>> )/m, $s) } my $t0; my $elapsed; $t0 = [gettimeofday]; GRT($s); $elapsed = tv_interval ( $t0, [gettimeofday]); print "GRT time: ", $elapsed, "\n"; $t0 = [gettimeofday]; ST($s); $elapsed = tv_interval ( $t0, [gettimeofday]); print "ST time: ", $elapsed; =begin C:\Old_Data\perlp>perl GRT_ST_transform.pl GRT time: 0.222224 ST time: 0.05873 C:\Old_Data\perlp> =cut #cmpthese( -1, { 'a' => GRT($s), 'b' => ST($s) } ) ; [download] The input file was the 9 lines (that I expanded to 90_000 lines for the test): `>>> prd1701 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 887G 3.0T 13% /wor +kspace/data >>> prd1702 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 746G 3.1T 23% /wor +kspace/data >>> prd1703 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 687G 3.2T 18% /wor +kspace/data` [download]	[reply] [d/l] [select]
Re^6: schwartzian transform problem by choroba (Cardinal) on Mar 23, 2025 at 11:11 UTC
`cmpthese( -1, { 'a' => GRT($s), 'b' => ST($s) } ) ;` [download] This calls both GRT and ST just once, and uses their output as the code to eval and benchmark. That's not what we want. Instead, we need `cmpthese(-3, { grt => sub { GRT($s) }, st => sub { ST($s) } });` [download] It still seems that ST is much faster: `Rate grt st grt 2.75/s -- -76% st 11.6/s 320% --` [download] But, notice the regex is different when splitting the string: `split m{(?<!\A)(?=>>>)}, $s # GRT split (/^(?=>>> )/m, $s) # ST` [download] So, let's add a test to verify the results stay correct `use Test::More tests => 1; is GRT($s), ST($s), 'same';` [download] and we can use the simpler regex from ST in GRT. Fortunately, the test remains successful, and the results are now different: `1..1 ok 1 - same Rate st grt st 11.7/s -- -3% grt 12.1/s 3% --` [download] An insignificant difference, but at least ST doesn't seem to be much faster now. Benchmarking and optimisation are hard. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^7: schwartzian transform problem by Cristoforo (Curate) on Mar 24, 2025 at 16:38 UTC
Thank you for the correction. I don't use benchmark often if at all. And the difference between the GRT and ST was the difference in the regex for `split`	[reply] [d/l]
Re^8: schwartzian transform problem by ikegami (Patriarch) on Mar 25, 2025 at 23:02 UTC
Re^9: schwartzian transform problem by Anonymous Monk on Mar 26, 2025 at 22:21 UTC