I have a fairly large perl-prog whose purpose is to translate some hefty (300mb+) textfiles into a ready-to-load-via-bcp ms-sql db.

Anyway, due to coming across more and more bugs in the input file, my code grew and grew to handle the various bits of duff data.

Consequently, I found that:

So... I tidied up the code, reduced the input reads to 1, and then set about doing some benchmarking to find out whether 'last LABEL' was a liability, and if so what the best alternative was.

The 'bad' news was that reducing the reads on the input file made comparatively little difference - I say 'bad' in inverted commas because, presumably, all this indicates is that perl is pretty efficient at reading a file, so that, provided the file is large enough, the following snippets of code are roughly equivalent:

open(IN, "bigfile.txt"); while(<IN>){ &big_sub1($_); } open(IN, "bigfile.txt"); while(<IN>){ &big_sub2($_); }

Versus

open(IN, "bigfile.txt"); while(<IN>){ &big_sub1($_); &big_sub2($_); }

Moving on from that to the 'last LABEL' issue, well, yep, 'last LABEL's are bad news.

Here's the code and output I used for my test:

use Benchmark; sub sub1(){ my $val = 3; TTEST: { if($val == 1){last TTEST;} if($val == 2){last TTEST;} if($val == 3){last TTEST;} if($val == 4){last TTEST;} } } sub sub2(){ my $val = 3; if($val == 1){} if($val == 2){} if($val == 3){} if($val == 4){} } sub sub3(){ my $val = 3; if($val == 1){} elsif($val == 2){} elsif($val == 3){} elsif($val == 4){} } my $codehash = {'sub1' => \&sub1,'sub2' => \&sub2,'sub3' => \&sub3}; timethese(5000000, $codehash);

And here's the (shortened) benchmark output:

sub1: 13 wallclock secs (12.80 CPU) sub2: 8 wallclock secs (8.24 CPU) sub3: 6 wallclock secs (6.66 CPU)

Two things suprised me about this:

  1. How very bad 'last LABEL' is.
  2. Hell, it's even worse than several 'ifs'

Now, just to check what was happening, I set $val to 1, and even then the 'last LABEL' constructs were still slower than multiple 'ifs' - despite the fact that the 'last LABEL' skips the other conditions - heh, any chance of adding a "your code is crap" message when running under -w if you use labels?

Anyway, that's me done. Not so much a meditation, more of an aimless ramble...

Tom Melly, tom@tomandlu.co.uk

Replies are listed 'Best First'.
Re: A Luser's Benchmarking Tale
by liz (Monsignor) on Nov 25, 2003 at 13:43 UTC
    I believe labels have such poor performance because they're resolved at runtime!
    my $label; LABEL: warn "label = $label\n"; $label = $label ? 'LAST': 'LABEL'; goto $label; LAST: warn "done, label = $label\n"; __END__ label = label = LABEL done, label = LAST
    Of course, that means you can easily go to a different label determined by run-time conditions (as shown above). To quote Johan Cruyff "Every advantage has its disadvantage". ;-)

    Liz

      Ah. So a bare next and last would be able to use the block's optree targets and don't require the same overhead?
Re: A Luser's Benchmarking Tale
by jasonk (Parson) on Nov 25, 2003 at 13:39 UTC

    Perl probably isn't as efficient at reading a file multiple times as you think it is. It's more likely the reason you didn't see much difference between reading it once and reading it twice is that you were running it on a decent operating system that was not memory-loaded, and so the first time you read it the OS kept the contents cached in memory, so the second time around it was read from memory rather than from disk.

    Whenever you are opening a file more than once you should keep this in mind, because the test showing no speed improvement may change in the future, especially if you attempt to run the script on a shared server which has more of a memory crunch, and thus doesn't keep things in the disk buffer for as long.

    How you benchmark it could also have a big impact on the results, as the bigger the 'big_subs' are, the less influence the relatively small impact of reading the file will be.


    We're not surrounded, we're in a target-rich environment!

      Good point regarding the caching (and one I hadn't considered). I'd taken in the point about 'big_sub' (hence the name ;)

      Still, when all's said and done, and even if caching wasn't an issue, I don't think I could bare to read a file more times than is strictly necessary - call it an aesthetic prejudice ;)

      Tom Melly, tom@tomandlu.co.uk
Re: A Luser's Benchmarking Tale
by Abigail-II (Bishop) on Nov 25, 2003 at 13:45 UTC
    I disagree it's worse that several ifs. Your benchmark shows that one 'last' takes more time than one integer compare. If you make $val equal to 1, and turn the =='s into =~'s, sub1 is faster than sub2 (at least, on my system). The price of a 'last' doesn't depend on the amount of if's nor on what you do in the expression belonging to the if, but it does matter with the multiple ifs.

    The fact that if the if/elsif chain is faster won't surprise anyone, as no conditions after a match will be tested, nor will a label need to be searched for (as is done with last).

    Abigail

      Hmm, well I take your point (and I had tested with $val=1 - see original post), but nevertheless it strikes me as damning with faint praise ;)

      What you're basically saying is that labels can be faster than multiple ifs provided you have enough conditions and you make them slow enough (e.g. regex)

      Given that multiple ifs are a pretty stoopid thing to do (I only used them in my benchmark for comparison purposes), this is hardly a glowing recommendation for labels.

      BTW I rewrote my test script to use $ARGV[0] for $val, and ran it with val=1 and val=4 for both versions (== and =~). Here are the results (which confirm your observation) 9kb gif of graph

      Tom Melly, tom@tomandlu.co.uk
Re: A Luser's Benchmarking Tale
by BrowserUk (Patriarch) on Nov 25, 2003 at 15:29 UTC

    I realise that this is only test code, but there is no reason to use a label in your example.

    P:\test>perl -le" { print 1; last if 1; print 2; } print 3;" 1 3

    last on it's own will exit closest encompassing loop, even if that loop is a bare block (except when it doesn't, like sort and map blocks etc.). A non-labelled last is considerably more efficient than a labelled one, and with care, there are very, very few situations where a labelled jump is necessary.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!
    Wanted!

Re: A Luser's Benchmarking Tale
by perrin (Chancellor) on Nov 25, 2003 at 15:45 UTC
    Hmmm, 5000000 executions / 13 seconds = damn good performance if you ask me. I don't see any reason to use a label in this code though. Is it faster with a naked "last" than with a label?

      "Is it faster with a naked last.."

      I hope so (checks... oops, a bare last breaks benchmarking AFAICT)

      As for not needing a label - in this test, no. My original production code had grown like mutant bamboo and the labels had been a quick fix - they're gone now (and that plus other fixes means my code is running about 300% faster!)

      Tom Melly, tom@tomandlu.co.uk
Re: A Luser's Benchmarking Tale
by hardburn (Abbot) on Nov 25, 2003 at 15:11 UTC

    IIRC, a label jump is done by a linear scan through the source code at run time. So performance will decrease as your code base gets larger.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

      Perl doesn't keep an index of label locations during compile? It would seem then that a label jump should be quite rapid. Not that i like labels at all.


      ___________
      Eric Hodges
Re: A Luser's Benchmarking Tale
by BUU (Prior) on Nov 25, 2003 at 22:12 UTC
    His code:
    Benchmark: timing 5000000 iterations of sub1, sub2, sub3... sub1: 8 wallclock secs @ 664805.21/s (n=5000000) sub2: 6 wallclock secs @ 988924.05/s (n=5000000) sub3: 4 wallclock secs @ 1139731.02/s (n=5000000)
    Changed:
    use Benchmark; sub sub1(){ my $val = 3; TTEST: { if($val == 1){last TTEST;} if($val == 2){last TTEST;} if($val == 3){last TTEST;} if($val == 4){last TTEST;} } } sub sub2(){ my $val = 3; TTEST: { if($val == 1){last;} if($val == 2){last;} if($val == 3){last;} if($val == 4){last;} } } sub sub3(){ my $val = 3; if($val == 1){} if($val == 2){} if($val == 3){} if($val == 4){} } my $codehash = {'sub1' => \&sub1,'sub2' => \&sub2,'sub3' => \&sub3}; timethese(5000000, $codehash);
    Output:
    Benchmark: timing 5000000 iterations of sub1, sub2, sub3... sub1: 8 wallclock secs ( 7.49 usr + 0.00 sys = 7.49 CPU) @ 66 +7467.63/s (n=5000000) sub2: 7 wallclock secs ( 7.27 usr + -0.01 sys = 7.26 CPU) @ 68 +8800.11/s (n=5000000) sub3: 5 wallclock secs ( 5.12 usr + 0.01 sys = 5.13 CPU) @ 97 +5419.43/s (n=5000000)