comment on

I have a fairly large perl-prog whose purpose is to translate some hefty (300mb+) textfiles into a ready-to-load-via-bcp ms-sql db.

Anyway, due to coming across more and more bugs in the input file, my code grew and grew to handle the various bits of duff data.

Consequently, I found that:

I was reading the whole input file 3 times instead of just once
I had lots of 'last LABEL' constructs

So... I tidied up the code, reduced the input reads to 1, and then set about doing some benchmarking to find out whether 'last LABEL' was a liability, and if so what the best alternative was.

The 'bad' news was that reducing the reads on the input file made comparatively little difference - I say 'bad' in inverted commas because, presumably, all this indicates is that perl is pretty efficient at reading a file, so that, provided the file is large enough, the following snippets of code are roughly equivalent:

open(IN, "bigfile.txt");
while(<IN>){
  &big_sub1($_);
}
open(IN, "bigfile.txt");
while(<IN>){
  &big_sub2($_);
}
[download]

Versus

open(IN, "bigfile.txt");
while(<IN>){
  &big_sub1($_);
  &big_sub2($_);
}
[download]

Moving on from that to the 'last LABEL' issue, well, yep, 'last LABEL's are bad news.

Here's the code and output I used for my test:

use Benchmark;

sub sub1(){
  my $val = 3;
  TTEST: {
    if($val == 1){last TTEST;}
    if($val == 2){last TTEST;}
    if($val == 3){last TTEST;}
    if($val == 4){last TTEST;}
  }
}

sub sub2(){
  my $val = 3;
  if($val == 1){}
  if($val == 2){}
  if($val == 3){}
  if($val == 4){}
}

sub sub3(){
  my $val = 3;
  if($val == 1){}
  elsif($val == 2){}
  elsif($val == 3){}
  elsif($val == 4){}
}

my $codehash = {'sub1' => \&sub1,'sub2' => \&sub2,'sub3' => \&sub3};
timethese(5000000, $codehash);
[download]

And here's the (shortened) benchmark output:

sub1: 13 wallclock secs (12.80 CPU)
sub2:  8 wallclock secs (8.24 CPU)
sub3:  6 wallclock secs (6.66 CPU)
[download]

Two things suprised me about this:

How very bad 'last LABEL' is.
Hell, it's even worse than several 'ifs'

Now, just to check what was happening, I set $val to 1, and even then the 'last LABEL' constructs were still slower than multiple 'ifs' - despite the fact that the 'last LABEL' skips the other conditions - heh, any chance of adding a "your code is crap" message when running under -w if you use labels?

Anyway, that's me done. Not so much a meditation, more of an aimless ramble...

Tom Melly, tom@tomandlu.co.uk

In reply to A Luser's Benchmarking Tale by Melly

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.