yoganaut has asked for the wisdom of the Perl Monks concerning the following question:

I've noticed a huge (10x) performance difference between two algorithims. It's not obvious (to me) why the difference is so great. I'm trying to parse each line of a file that's already been read into an array. There are three different lines that are interesting. They are in the @patterns array. I have an outer loop to iterate over the source lines and an inner loop to iterate over the patterns. I store the interesting lines in another array. With a particular data set this code takes around 30 seconds to execute.
my @patterns = ("SDC: HUT time changed: 0", "BATT: Cap=0\\(watt-min\\) HUT=0\\(min\\) 0\\(hrs\ +\) state=GOOD->BAD", "BATT: Log battery system condition GOOD->BAD"); my @events; foreach my $line (@$srcRef) { foreach my $pattern (@patterns) { if ($line =~ /$pattern/) { push(@events, $line); last; } } }
Unrolling the inner loop like so, results in 3 second execution and appears to produce the same results. Why the 10x hit with the nested loops? What am I missing? Thanks.
foreach my $line (@$srcRef) { if ($line =~ /SDC: HUT time changed: 0/) { push(@events, $line); } elsif ($line =~ /BATT: Cap=0\(watt-min\) HUT=0\(min\) 0\(hrs\) + state=GOOD->BAD/) { push(@events, $line); } elsif ($line =~ /BATT: Log battery system condition GOOD->BAD/ +) { push(@events, $line); } }

Replies are listed 'Best First'.
Re: performance explanation needed
by Zaxo (Archbishop) on May 04, 2005 at 23:48 UTC

    The need to recompile the three patterns each time through the outer loop is probably what's taking extra time. You could compile them once with qr, @patterns = map qr/\Q$_\E/, @patterns; or test lines of interest with index, since the patterns seem to be constant,

    foreach my $pattern (@patterns) { if (-1 != index $line, $pattern) { push(@events, $line); last; } }
    I think the fastest way is to avoid the inner loop and gain the speed of hash lookup like this:
    chomp @$srcRef; my %pattern; @pattern{@patterns} = (); my @events = grep {exists $pattern{$_}} @$srcRef;
    That solution depends on there being no other content in the @$srcRef lines, besides the chomped line ends.

    After Compline,
    Zaxo

Re: performance explanation needed
by Roy Johnson (Monsignor) on May 04, 2005 at 23:39 UTC
    In the latter case, Perl doesn't have to interpolate variables into the pattern every pass through the loop. Try making your variables qr// expressions instead of strings in the former example, and see if it makes a difference.

    Caution: Contents may have been coded under pressure.