in reply to Benchmarking File Retrevial

Hi. Just a thought, but shouldnt you be reinitializing your variables before each test pass? Specifically
@hitwords
Otherwise if the test data size is at all signifigant youll end up holidng multiple copies of it in memory. This could lead to excessive swapping and the like.

Also I dont userstand your use of the sub run. cmpthese will execute count_it anyway so I dont see the purpose at all. A last point is that

while($word = <WORDS>){
rings a bell somewhere. Its not that same thing iirc as saying
while (<WORDS>) {
Although i could be wrong. I cant remember where this opinion comes from.

I would have expected your benchmark to look more like:

use Benchmark qw/cmpthese/; cmpthese 1,{ read_proc => <<'EOFCODE', open(my $words,"words.txt") or die("Wordlist unavaliable.\n"); my $hitcounter=0; my @hitwords; my $counter=0; my @words = <$words>; close($words); foreach my $word (@words){ chomp $word; if ($word =~ m/[aeiouyAEIOUY]{4,}/){ push(@hitwords,$word); $hitcounter++; } $counter++; } EOFCODE for_proc => <<'EOFCODE', open(my $words,"words.txt") or die("Wordlist unavaliable.\n"); my $hitcounter=0; my @hitwords; my $counter=0; foreach my $word (<$words>){ chomp $word; if ($word =~ m/[aeiouyAEIOUY]{4,}/){ push(@hitwords,$word); $hitcounter++; } $counter++; } close($words); EOFCODE while_proc => <<'EOFCODE', open(my $words,"words.txt") or die("Wordlist unavaliable.\n"); my $hitcounter=0; my @hitwords; my $counter=0; while(<$words>){ chomp; if (m/[aeiouyAEIOUY]{4,}/){ push(@hitwords,$_); $hitcounter++; } $counter++; } close($words); EOFCODE };

--- demerphq
my friends call me, usually because I'm late....

Replies are listed 'Best First'.
Re: Re: Benchmarking File Retrevial
by jpfarmer (Pilgrim) on Dec 15, 2002 at 23:14 UTC
    I'm getting my Benchmark syntax from Programming Perl. Here's the example in the book:
    use Benchmark qw/countit cmpthese/; sub run($) { countit(5, @_) } for $size (2, 200, 20_000) { $s = "." x $len; print "\nDATASIZE = $size\n"; cmpthese { chop2 => run q{ $t = $s; chop $t; chop $t; }, subs => run q{ ($t = $s) =~ s/..\Z//s; }, lsubstr => run q{ $t = $s; substr($t, -2) = ''; }, rsubstr => run q{ $t = substr($s, 0, length($s)-2); }, }; }
    Reinitializing @hitwords is a good idea. I'll try it. Also, I thought the only difference between while($word = <WORDS>){ and while(<WORDS>){ was that in the latter, the line was stored in !_. I haven't been able to find any documentation to the contrary, although I'd believe there might be a difference.

      Use B::Deparse. I think it may be documented in perlopen, but the second construct adds the defined operator.

      $ perl -MO=Deparse while (<STDIN>) { print; }

      produces:

      while (defined($_ = <STDIN>)) { print $_; } - syntax OK
      Well unless theres a version issue going on here then I would write that benchamrk like this
      use Benchmark 'cmpthese'; for $size (2, 200, 20_000) { $s = "." x $len; print "\nDATASIZE = $size\n"; cmpthese -5,{ chop2 => '$t = $s; chop $t; chop $t;', subs => '($t = $s) =~ s/..\Z//s;', lsubstr => '$t = $s; substr($t, -2) = "";', rsubstr => '$t = substr($s, 0, length($s)-2);', }; }
      There are a few more ways, but this is a direct but less verbose copy of what you posted. The -5 argument indicates that the benchmarking for each item should take at minimum 5 seconds. (It may take longer) If you used a positive argument then it does that many runs of the given code. It will warn if you dont use enough iterations for it to get a "reasonable" sample.

      Please consult the Benchmark documentation as I suspect the interface has moved on since the edition of Programming Perl that you are using. I say this because your code does work, but it looks like Benchmark has been updated to do that idiom automatically.

      cheers

      --- demerphq
      my friends call me, usually because I'm late....

Re: Re: Benchmarking File Retrevial
by jpfarmer (Pilgrim) on Dec 16, 2002 at 08:14 UTC
    Using the code from Re: Benchmarking File Retrevial, I found that if both read_proc and for_proc are used together, for_proc will run and then read_proc will hang. If I comment either out, then the program will run properly. This behavior is under ActivePerl. I tried it under UNIX Perl out of curiousity, and it works properly. Both are the same version of Perl and the same version of Benchmark. Perhaps we've discovered a bug in ActivePerl? is there some other explaination for this behavior?