comment on

llil2cmd.pl - abbreviated version of llil2grt.pl

For cheap thrills, I created llil2cmd.pl, a short command line version of llil2grt.pl:

#!perl -n
# llil2cmd.pl. Abbreviated version of llil2grt.pl.
chomp; ($w,$c) = split/\t/; $h{$w} += $c;
END {
   $\=$/;
   push @l, pack('NA*',-$v,"$k\t$v") while ($k,$v)=each %h;
   print substr($_,4) for sort @l;
}
[download]

Curiously, this abbreviated version runs at about the same speed on Windows, but significantly faster on my Ubuntu Linux VM:

> time perl llil2grt.pl big1.txt big2.txt big3.txt >grt1.tmp
llil2grt start
get_properties : 8 secs
sort + output  : 22 secs
total          : 30 secs

real    0m33.475s
user    0m32.180s
sys     0m1.295s
[download]

> time perl llil2cmd.pl big1.txt big2.txt big3.txt >cmd1.tmp

real    0m28.937s
user    0m27.843s
sys     0m1.093s

> diff cmd1.tmp grt1.tmp
[download]

To get more detailed timings, I hacked out a long short version:

#!perl -n
# llil2cmd-long.pl. A long short version of llil2grt.pl.
BEGIN {
   $tstart1 = time;
}
chomp; ($w,$c) = split/\t/; $h{$w} += $c;
END {
   my $tstart2 = time;
   $\=$/;
   push @l, pack('NA*',-$v,"$k\t$v") while ($k,$v)=each %h;
   print substr($_,4) for sort @l;
   my $tend2  = time;
   my $taken1 = $tstart2 - $tstart1;
   my $taken2 = $tend2   - $tstart2;
   my $taken  = $tend2   - $tstart1;
   warn "get_properties : $taken1 secs\n";
   warn "sort + output  : $taken2 secs\n";
   warn "total          : $taken secs\n";
}
[download]

$ time perl llil2cmd-long.pl big1.txt big2.txt big3.txt >long1.tmp
get_properties : 7 secs
sort + output  : 21 secs
total          : 28 secs

real    0m28.629s
user    0m27.707s
sys     0m0.917s

> diff long1.tmp grt1.tmp
[download]

As you can see from the times reported by the Linux time command, it seems that large lexical variables in Perl are significantly slower to cleanup at program exit than non-lexicals (about three seconds slower in this example: 33.475s vs 30 secs for llil2grt.pl, 28.629s vs 28 secs for llil2cmd-long.pl).

New perl 5.36 experimental for_list feature

After stumbling upon perl 5.36 and the for_list feature - a simple speed comparison I had to give the perl 5.36 for_list feature a try (update: List::Util's pairmap might be worth a try given it was mentioned in a reply).

After building perl v5.36 from source (my Ubuntu system perl is v5.34 - update see improved build perl 5.38 notes):

wget https://www.cpan.org/src/5.0/perl-5.36.0.tar.gz
(update: run sha256sum perl-5.36.0.tar.gz and check matches https://ww
+w.cpan.org/src/5.0/perl-5.36.0.tar.gz.sha256.txt)
tar -xzf perl-5.36.0.tar.gz
cd perl-5.36.0
./Configure -des -Dprefix=$HOME/localperl
make 2>&1 | tee make.tmp
make test 2>&1 | tee test.tmp
make install 2>&1 | tee install.tmp
[download]

and adding:

use 5.036;
use experimental qw/for_list declared_refs/;
[download]

to the top of llil2grt.pl while changing one line from:

while (my ($k, $v) = each %{$href}) { push @lines, pack('NA*', -$v, "$
+k\t$v") }
[download]

to:

for my ($k, $v) (%{$href}) { push @lines, pack('NA*', -$v, "$k\t$v") }
[download]

it produced the same result, but did not run appreciably faster.

Update: as for why it isn't much faster, see ikegami's replies at: Re^2: Why does each() always re-evaluate its argument? (Updated x2 - experimental "for_list" )

Update: Improved Ubuntu Perl Build Notes

Re^7: Meaning of XS object version (Package Manager Security References - example building Perl securely from source) - example (secure) build of perl v5.38.0 from source on my Ubuntu Linux VM
Re^5: Size of Judy::HS array: where is MemUsed()? - perldelta, Perl Releases and Building Perl - notes on building Perl from source on my Linux VM

Manual install of CPAN Roman module

Later I manually installed Roman by CHORNY from CPAN into this local non-root Perl 5.36 as follows:

$ cd $HOME/localperlmodules
$ type perl
perl is hashed ($HOME/localperl/bin/perl)
$ wget https://www.cpan.org/modules/by-module/Roman/Roman-1.24.tar.gz
$ tar -xzf Roman-1.24.tar.gz
$ cd Roman-1.24
$ perl Makefile.PL 2>&1 | tee make.tmp
$ make 2>&1 | tee make.tmp
$ make test 2>&1 | tee test.tmp
$ make install 2>&1 | tee install.tmp
[download]

Update: Better to do it via: cpanm --from https://www.cpan.org/ --verify Roman 2>&1 | tee Roman.tmp

Updated: Added steps for building perl v5.36.0 from source and manual install of Roman module. Noted that large lexical variables are slower to cleanup at program exit.

In reply to Re^7: Rosetta Code: Long List is Long (Updated Solutions - short Perl GRT and for_list) by eyepopslikeamosquito
in thread Rosetta Code: Long List is Long by eyepopslikeamosquito

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.