Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^7: Rosetta Code: Long List is Long (Updated Solutions - short Perl GRT and for_list)

by eyepopslikeamosquito (Archbishop)
on Jan 02, 2023 at 09:58 UTC ( [id://11149284]=note: print w/replies, xml ) Need Help??


in reply to Re^6: Rosetta Code: Long List is Long (Updated Solutions - GRT)
in thread Rosetta Code: Long List is Long

llil2cmd.pl - abbreviated version of llil2grt.pl

For cheap thrills, I created llil2cmd.pl, a short command line version of llil2grt.pl:

#!perl -n # llil2cmd.pl. Abbreviated version of llil2grt.pl. chomp; ($w,$c) = split/\t/; $h{$w} += $c; END { $\=$/; push @l, pack('NA*',-$v,"$k\t$v") while ($k,$v)=each %h; print substr($_,4) for sort @l; }

Curiously, this abbreviated version runs at about the same speed on Windows, but significantly faster on my Ubuntu Linux VM:

> time perl llil2grt.pl big1.txt big2.txt big3.txt >grt1.tmp llil2grt start get_properties : 8 secs sort + output : 22 secs total : 30 secs real 0m33.475s user 0m32.180s sys 0m1.295s

> time perl llil2cmd.pl big1.txt big2.txt big3.txt >cmd1.tmp real 0m28.937s user 0m27.843s sys 0m1.093s > diff cmd1.tmp grt1.tmp

To get more detailed timings, I hacked out a long short version:

#!perl -n # llil2cmd-long.pl. A long short version of llil2grt.pl. BEGIN { $tstart1 = time; } chomp; ($w,$c) = split/\t/; $h{$w} += $c; END { my $tstart2 = time; $\=$/; push @l, pack('NA*',-$v,"$k\t$v") while ($k,$v)=each %h; print substr($_,4) for sort @l; my $tend2 = time; my $taken1 = $tstart2 - $tstart1; my $taken2 = $tend2 - $tstart2; my $taken = $tend2 - $tstart1; warn "get_properties : $taken1 secs\n"; warn "sort + output : $taken2 secs\n"; warn "total : $taken secs\n"; }

$ time perl llil2cmd-long.pl big1.txt big2.txt big3.txt >long1.tmp get_properties : 7 secs sort + output : 21 secs total : 28 secs real 0m28.629s user 0m27.707s sys 0m0.917s > diff long1.tmp grt1.tmp

As you can see from the times reported by the Linux time command, it seems that large lexical variables in Perl are significantly slower to cleanup at program exit than non-lexicals (about three seconds slower in this example: 33.475s vs 30 secs for llil2grt.pl, 28.629s vs 28 secs for llil2cmd-long.pl).

New perl 5.36 experimental for_list feature

After stumbling upon perl 5.36 and the for_list feature - a simple speed comparison I had to give the perl 5.36 for_list feature a try (update: List::Util's pairmap might be worth a try given it was mentioned in a reply).

After building perl v5.36 from source (my Ubuntu system perl is v5.34 - update see improved build perl 5.38 notes):

wget https://www.cpan.org/src/5.0/perl-5.36.0.tar.gz (update: run sha256sum perl-5.36.0.tar.gz and check matches https://ww +w.cpan.org/src/5.0/perl-5.36.0.tar.gz.sha256.txt) tar -xzf perl-5.36.0.tar.gz cd perl-5.36.0 ./Configure -des -Dprefix=$HOME/localperl make 2>&1 | tee make.tmp make test 2>&1 | tee test.tmp make install 2>&1 | tee install.tmp
and adding:
use 5.036; use experimental qw/for_list declared_refs/;
to the top of llil2grt.pl while changing one line from:
while (my ($k, $v) = each %{$href}) { push @lines, pack('NA*', -$v, "$ +k\t$v") }
to:
for my ($k, $v) (%{$href}) { push @lines, pack('NA*', -$v, "$k\t$v") }
it produced the same result, but did not run appreciably faster.

Update: as for why it isn't much faster, see ikegami's replies at: Re^2: Why does each() always re-evaluate its argument? (Updated x2 - experimental "for_list" )

Update: Improved Ubuntu Perl Build Notes

Manual install of CPAN Roman module

Later I manually installed Roman by CHORNY from CPAN into this local non-root Perl 5.36 as follows:

$ cd $HOME/localperlmodules $ type perl perl is hashed ($HOME/localperl/bin/perl) $ wget https://www.cpan.org/modules/by-module/Roman/Roman-1.24.tar.gz $ tar -xzf Roman-1.24.tar.gz $ cd Roman-1.24 $ perl Makefile.PL 2>&1 | tee make.tmp $ make 2>&1 | tee make.tmp $ make test 2>&1 | tee test.tmp $ make install 2>&1 | tee install.tmp

Update: Better to do it via: cpanm --from https://www.cpan.org/ --verify Roman 2>&1 | tee Roman.tmp

Updated: Added steps for building perl v5.36.0 from source and manual install of Roman module. Noted that large lexical variables are slower to cleanup at program exit.

  • Comment on Re^7: Rosetta Code: Long List is Long (Updated Solutions - short Perl GRT and for_list)
  • Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11149284]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-04-18 00:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found