Re^7: Rosetta Code: Long List is Long (Updated Solutions

llil2cmd.pl - abbreviated version of llil2grt.pl

For cheap thrills, I created llil2cmd.pl, a short command line version of llil2grt.pl:

#!perl -n
# llil2cmd.pl. Abbreviated version of llil2grt.pl.
chomp; ($w,$c) = split/\t/; $h{$w} += $c;
END {
   $\=$/;
   push @l, pack('NA*',-$v,"$k\t$v") while ($k,$v)=each %h;
   print substr($_,4) for sort @l;
}
[download]

Curiously, this abbreviated version runs at about the same speed on Windows, but significantly faster on my Ubuntu Linux VM:

> time perl llil2grt.pl big1.txt big2.txt big3.txt >grt1.tmp
llil2grt start
get_properties : 8 secs
sort + output  : 22 secs
total          : 30 secs

real    0m33.475s
user    0m32.180s
sys     0m1.295s
[download]

> time perl llil2cmd.pl big1.txt big2.txt big3.txt >cmd1.tmp

real    0m28.937s
user    0m27.843s
sys     0m1.093s

> diff cmd1.tmp grt1.tmp
[download]

To get more detailed timings, I hacked out a long short version:

#!perl -n
# llil2cmd-long.pl. A long short version of llil2grt.pl.
BEGIN {
   $tstart1 = time;
}
chomp; ($w,$c) = split/\t/; $h{$w} += $c;
END {
   my $tstart2 = time;
   $\=$/;
   push @l, pack('NA*',-$v,"$k\t$v") while ($k,$v)=each %h;
   print substr($_,4) for sort @l;
   my $tend2  = time;
   my $taken1 = $tstart2 - $tstart1;
   my $taken2 = $tend2   - $tstart2;
   my $taken  = $tend2   - $tstart1;
   warn "get_properties : $taken1 secs\n";
   warn "sort + output  : $taken2 secs\n";
   warn "total          : $taken secs\n";
}
[download]

$ time perl llil2cmd-long.pl big1.txt big2.txt big3.txt >long1.tmp
get_properties : 7 secs
sort + output  : 21 secs
total          : 28 secs

real    0m28.629s
user    0m27.707s
sys     0m0.917s

> diff long1.tmp grt1.tmp
[download]

As you can see from the times reported by the Linux time command, it seems that large lexical variables in Perl are significantly slower to cleanup at program exit than non-lexicals (about three seconds slower in this example: 33.475s vs 30 secs for llil2grt.pl, 28.629s vs 28 secs for llil2cmd-long.pl).

New perl 5.36 experimental for_list feature

After stumbling upon perl 5.36 and the for_list feature - a simple speed comparison I had to give the perl 5.36 for_list feature a try (update: List::Util's pairmap might be worth a try given it was mentioned in a reply).

After building perl v5.36 from source (my Ubuntu system perl is v5.34 - update see improved build perl 5.38 notes):

wget https://www.cpan.org/src/5.0/perl-5.36.0.tar.gz
(update: run sha256sum perl-5.36.0.tar.gz and check matches https://ww
+w.cpan.org/src/5.0/perl-5.36.0.tar.gz.sha256.txt)
tar -xzf perl-5.36.0.tar.gz
cd perl-5.36.0
./Configure -des -Dprefix=$HOME/localperl
make 2>&1 | tee make.tmp
make test 2>&1 | tee test.tmp
make install 2>&1 | tee install.tmp
[download]

and adding:

use 5.036;
use experimental qw/for_list declared_refs/;
[download]

to the top of llil2grt.pl while changing one line from:

while (my ($k, $v) = each %{$href}) { push @lines, pack('NA*', -$v, "$
+k\t$v") }
[download]

to:

for my ($k, $v) (%{$href}) { push @lines, pack('NA*', -$v, "$k\t$v") }
[download]

it produced the same result, but did not run appreciably faster.

Update: as for why it isn't much faster, see ikegami's replies at: Re^2: Why does each() always re-evaluate its argument? (Updated x2 - experimental "for_list" )

Update: Improved Ubuntu Perl Build Notes

Re^7: Meaning of XS object version (CPAN and Package Manager Security References) - example (secure) build of perl v5.38.0 from source on my Ubuntu Linux VM
Re^5: Size of Judy::HS array: where is MemUsed()? - perldelta, Perl Releases and Building Perl - notes on building Perl from source on my Linux VM

Manual install of CPAN Roman module

Later I manually installed Roman by CHORNY from CPAN into this local non-root Perl 5.36 as follows:

$ cd $HOME/localperlmodules
$ type perl
perl is hashed ($HOME/localperl/bin/perl)
$ wget https://www.cpan.org/modules/by-module/Roman/Roman-1.24.tar.gz
$ tar -xzf Roman-1.24.tar.gz
$ cd Roman-1.24
$ perl Makefile.PL 2>&1 | tee make.tmp
$ make 2>&1 | tee make.tmp
$ make test 2>&1 | tee test.tmp
$ make install 2>&1 | tee install.tmp
[download]

Update: Better to do it via: cpanm --from https://www.cpan.org/ --verify Roman 2>&1 | tee Roman.tmp

Updated: Added steps for building perl v5.36.0 from source and manual install of Roman module. Noted that large lexical variables are slower to cleanup at program exit.

Comment on Re^7: Rosetta Code: Long List is Long (Updated Solutions - short Perl GRT and for_list) Select or Download Code


P is for Practical
	PerlMonks

Re^7: Rosetta Code: Long List is Long (Updated Solutions - short Perl GRT and for_list)