Re: Benchmarking File Retrevial

Hi. Just a thought, but shouldnt you be reinitializing your variables before each test pass? Specifically

@hitwords
[download]

Otherwise if the test data size is at all signifigant youll end up holidng multiple copies of it in memory. This could lead to excessive swapping and the like.

Also I dont userstand your use of the sub run. cmpthese will execute count_it anyway so I dont see the purpose at all. A last point is that

while($word = <WORDS>){
[download]

rings a bell somewhere. Its not that same thing iirc as saying

while (<WORDS>) {
[download]

Although i could be wrong. I cant remember where this opinion comes from.

I would have expected your benchmark to look more like:

use Benchmark qw/cmpthese/;

cmpthese 1,{
read_proc => <<'EOFCODE',
    open(my $words,"words.txt") or die("Wordlist unavaliable.\n");
    my $hitcounter=0;
    my @hitwords;
    my $counter=0;
    my @words = <$words>;
    close($words);
    foreach my $word (@words){
        chomp $word;
        if ($word =~ m/[aeiouyAEIOUY]{4,}/){
          push(@hitwords,$word);
          $hitcounter++; 
        }
        $counter++;
    }
EOFCODE

for_proc => <<'EOFCODE',
    open(my $words,"words.txt") or die("Wordlist unavaliable.\n");
    my $hitcounter=0;
    my @hitwords;
    my $counter=0;
    foreach my $word (<$words>){
        chomp $word;
        if ($word =~ m/[aeiouyAEIOUY]{4,}/){
           push(@hitwords,$word);
           $hitcounter++; 
        }
        $counter++;
    }
    close($words);
EOFCODE
while_proc => <<'EOFCODE',
    open(my $words,"words.txt") or die("Wordlist unavaliable.\n");
    my $hitcounter=0;
    my @hitwords;
    my $counter=0;
    while(<$words>){
        chomp;
        if (m/[aeiouyAEIOUY]{4,}/){
           push(@hitwords,$_);
           $hitcounter++; 
        }
        $counter++;
    }
    close($words);
EOFCODE
};
[download]

--- demerphq
my friends call me, usually because I'm late....

Comment on Re: Benchmarking File Retrevial Select or Download Code

Replies are listed 'Best First'.
Re: Re: Benchmarking File Retrevial by jpfarmer (Pilgrim) on Dec 15, 2002 at 23:14 UTC
I'm getting my Benchmark syntax from Programming Perl. Here's the example in the book: `use Benchmark qw/countit cmpthese/; sub run($) { countit(5, @_) } for $size (2, 200, 20_000) { $s = "." x $len; print "\nDATASIZE = $size\n"; cmpthese { chop2 => run q{ $t = $s; chop $t; chop $t; }, subs => run q{ ($t = $s) =~ s/..\Z//s; }, lsubstr => run q{ $t = $s; substr($t, -2) = ''; }, rsubstr => run q{ $t = substr($s, 0, length($s)-2); }, }; }` [download] Reinitializing @hitwords is a good idea. I'll try it. Also, I thought the only difference between `while($word = <WORDS>){` and `while(<WORDS>){` was that in the latter, the line was stored in !_. I haven't been able to find any documentation to the contrary, although I'd believe there might be a difference.	[reply] [d/l] [select]
Re: Re: Re: Benchmarking File Retrevial by chromatic (Archbishop) on Dec 15, 2002 at 23:51 UTC
Use B::Deparse. I think it may be documented in perlopen, but the second construct adds the `defined` operator. `$ perl -MO=Deparse while (<STDIN>) { print; }` [download] produces: `while (defined($_ = <STDIN>)) { print $_; } - syntax OK` [download]	[reply] [d/l] [select]
Re: Re: Re: Benchmarking File Retrevial by demerphq (Chancellor) on Dec 16, 2002 at 00:06 UTC
Well unless theres a version issue going on here then I would write that benchamrk like this `use Benchmark 'cmpthese'; for $size (2, 200, 20_000) { $s = "." x $len; print "\nDATASIZE = $size\n"; cmpthese -5,{ chop2 => '$t = $s; chop $t; chop $t;', subs => '($t = $s) =~ s/..\Z//s;', lsubstr => '$t = $s; substr($t, -2) = "";', rsubstr => '$t = substr($s, 0, length($s)-2);', }; }` [download] There are a few more ways, but this is a direct but less verbose copy of what you posted. The -5 argument indicates that the benchmarking for each item should take at minimum 5 seconds. (It may take longer) If you used a positive argument then it does that many runs of the given code. It will warn if you dont use enough iterations for it to get a "reasonable" sample. Please consult the Benchmark documentation as I suspect the interface has moved on since the edition of Programming Perl that you are using. I say this because your code does work, but it looks like Benchmark has been updated to do that idiom automatically. cheers --- demerphq my friends call me, usually because I'm late....	[reply] [d/l]
Re: Re: Benchmarking File Retrevial by jpfarmer (Pilgrim) on Dec 16, 2002 at 08:14 UTC
Using the code from Re: Benchmarking File Retrevial, I found that if both read_proc and for_proc are used together, for_proc will run and then read_proc will hang. If I comment either out, then the program will run properly. This behavior is under ActivePerl. I tried it under UNIX Perl out of curiousity, and it works properly. Both are the same version of Perl and the same version of Benchmark. Perhaps we've discovered a bug in ActivePerl? is there some other explaination for this behavior?	[reply]