Re: Optimizing existing Perl code (in practise)

I definitely think benchmarking is the key answer here.

I think no matter what, this is an implementation specific problem. I always wrote Perl for
programmer speed, and paid less attention to execution speed. Until I started working on problems
that were big enough to deal with datasets ranging from hundreds of meg to a few gig in size.

I love Perl but for data this big, and the bit of processing required, I would have initially went
with either C or C++. BUT - I work in a place where most everyone knows Perl and not many know C/C++
so Perl optimization has become a big issue.

I've learned a lot about how slight code changes can increase efficiency, especially when
certain tasks need to be done many times over. I've seen major speed increases
just by benchmarking and trying a different solution, but keeping the same algorithm.
Things especially like

my @a = ();
if ( $foo =~ /^(\d+)\s+(\w+)\s*$/ ) {
    @a = ($1, $2);
}
[download]

vs.

my @a = split (/\s+/, $foo);
[download]

Guess what? In my system, option #1 runs about 90% faster.

-felonious --

Comment on Re: Optimizing existing Perl code (in practise) Select or Download Code

Replies are listed 'Best First'.
Re: Re: Optimizing existing Perl code (in practise) by Anonymous Monk on Aug 19, 2002 at 18:48 UTC
Those two code snippets are not at all similar in function, so benchmarking them is useless.	[reply]
Re: Re: Re: Optimizing existing Perl code (in practise) by feloniousMonk (Pilgrim) on Aug 20, 2002 at 14:04 UTC
Um, they do perform the same function. They both place 2 variables into an array.... Yes, the method is different but what I intended to illustrate is that for a given set of data, 2 different methods of processing may have significant performance differences while giving the same results. Also implicit in the code is that the solution will not work everywhere, which is why optimization depends on what you intend on optimizing. -felonious --	[reply]
Re^4: Optimizing existing Perl code (in practise) by Aristotle (Chancellor) on Aug 20, 2002 at 15:27 UTC
No they don't. For starters, your `split` produces and assigns at least three values in every case the pattern matches. The difference in their effects may be irrelevant to your specific application, but that doesn't make them equivalent. Taking that into consideration from the start, you shouldn't have needed to benchmark them to predict the outcome. If you want a regex version that works meaningfully similar to the `split`, it would have to look something like this: `my @a = ($foo =~ /(?:\s+)?(.?)(?=\s)/g);` (Because your pattern is as simple as `\s+`, you can formulate a regex version like `my @a = ($foo =~ /(\S+)/g);` but that doesn't generalize to splitting at `foo(?:bar\|baz)?` ) Makeshifts last the longest.*	[reply] [d/l] [select]
Re: Re: Re: Re: Optimizing existing Perl code (in practise) by Anonymous Monk on Aug 20, 2002 at 15:57 UTC
No the do not perform the same function. Your re method functions as a gaurd clause allowing for a) early failure and b) avoiding assignment on failure. The split version performs the assignment even if the strings do not match the pattern. If your data is always going to pass the re, then the split version would be the faster version (and even better than your split version would be `split " ", $foo`). Care to show your benchmark where the re version was 90% faster?	[reply] [d/l]