llil2cmd.pl - abbreviated version of llil2grt.pl

For cheap thrills, I created llil2cmd.pl, a short command line version of llil2grt.pl:

#!perl -n # llil2cmd.pl. Abbreviated version of llil2grt.pl. chomp; ($w,$c) = split/\t/; $h{$w} += $c; END { $\=$/; push @l, pack('NA*',-$v,"$k\t$v") while ($k,$v)=each %h; print substr($_,4) for sort @l; }

Curiously, this abbreviated version runs at about the same speed on Windows, but significantly faster on my Ubuntu Linux VM:

> time perl llil2grt.pl big1.txt big2.txt big3.txt >grt1.tmp llil2grt start get_properties : 8 secs sort + output : 22 secs total : 30 secs real 0m33.475s user 0m32.180s sys 0m1.295s

> time perl llil2cmd.pl big1.txt big2.txt big3.txt >cmd1.tmp real 0m28.937s user 0m27.843s sys 0m1.093s > diff cmd1.tmp grt1.tmp

To get more detailed timings, I hacked out a long short version:

#!perl -n # llil2cmd-long.pl. A long short version of llil2grt.pl. BEGIN { $tstart1 = time; } chomp; ($w,$c) = split/\t/; $h{$w} += $c; END { my $tstart2 = time; $\=$/; push @l, pack('NA*',-$v,"$k\t$v") while ($k,$v)=each %h; print substr($_,4) for sort @l; my $tend2 = time; my $taken1 = $tstart2 - $tstart1; my $taken2 = $tend2 - $tstart2; my $taken = $tend2 - $tstart1; warn "get_properties : $taken1 secs\n"; warn "sort + output : $taken2 secs\n"; warn "total : $taken secs\n"; }

$ time perl llil2cmd-long.pl big1.txt big2.txt big3.txt >long1.tmp get_properties : 7 secs sort + output : 21 secs total : 28 secs real 0m28.629s user 0m27.707s sys 0m0.917s > diff long1.tmp grt1.tmp

As you can see from the times reported by the Linux time command, it seems that large lexical variables in Perl are significantly slower to cleanup at program exit than non-lexicals (about three seconds slower in this example: 33.475s vs 30 secs for llil2grt.pl, 28.629s vs 28 secs for llil2cmd-long.pl).

New perl 5.36 experimental for_list feature

After stumbling upon perl 5.36 and the for_list feature - a simple speed comparison I had to give the perl 5.36 for_list feature a try (update: List::Util's pairmap might be worth a try given it was mentioned in a reply).

After building perl v5.36 from source (my Ubuntu system perl is v5.34 - update see improved build perl 5.38 notes):

wget https://www.cpan.org/src/5.0/perl-5.36.0.tar.gz (update: run sha256sum perl-5.36.0.tar.gz and check matches https://ww +w.cpan.org/src/5.0/perl-5.36.0.tar.gz.sha256.txt) tar -xzf perl-5.36.0.tar.gz cd perl-5.36.0 ./Configure -des -Dprefix=$HOME/localperl make 2>&1 | tee make.tmp make test 2>&1 | tee test.tmp make install 2>&1 | tee install.tmp
and adding:
use 5.036; use experimental qw/for_list declared_refs/;
to the top of llil2grt.pl while changing one line from:
while (my ($k, $v) = each %{$href}) { push @lines, pack('NA*', -$v, "$ +k\t$v") }
to:
for my ($k, $v) (%{$href}) { push @lines, pack('NA*', -$v, "$k\t$v") }
it produced the same result, but did not run appreciably faster.

Update: as for why it isn't much faster, see ikegami's replies at: Re^2: Why does each() always re-evaluate its argument? (Updated x2 - experimental "for_list" )

Update: Improved Ubuntu Perl Build Notes

Manual install of CPAN Roman module

Later I manually installed Roman by CHORNY from CPAN into this local non-root Perl 5.36 as follows:

$ cd $HOME/localperlmodules $ type perl perl is hashed ($HOME/localperl/bin/perl) $ wget https://www.cpan.org/modules/by-module/Roman/Roman-1.24.tar.gz $ tar -xzf Roman-1.24.tar.gz $ cd Roman-1.24 $ perl Makefile.PL 2>&1 | tee make.tmp $ make 2>&1 | tee make.tmp $ make test 2>&1 | tee test.tmp $ make install 2>&1 | tee install.tmp

Update: Better to do it via: cpanm --from https://www.cpan.org/ --verify Roman 2>&1 | tee Roman.tmp

Updated: Added steps for building perl v5.36.0 from source and manual install of Roman module. Noted that large lexical variables are slower to cleanup at program exit.


In reply to Re^7: Rosetta Code: Long List is Long (Updated Solutions - short Perl GRT and for_list) by eyepopslikeamosquito
in thread Rosetta Code: Long List is Long by eyepopslikeamosquito

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.