jarich has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to optimise some code from hell, which of course I didn't write.

The basic structure of the code I'm working on is this:

while (<>) { ..... if(/...../) { ... } elsif(/...../) { ... } ..... # and a bunch more regexps increment_counters(...); ..... } sub increment_counters { .... if(/...../) { ..... next; # note that this nexts the above loop } if(/..../) { .... next; } ...... }

My questions are the following. None of the regular expressions here are precompiled. Is there a way to precompile them without creating more variables eg:

if( qr/..../ )
or rather something like that, as that won't work. Alternately, if I precompile the expressions as a variable:
my $date = qr/...../; if(m/$date/) { ..... }
will the precompilation disappear once the variable goes out of scope? I'm assuming it will but wouldn't that ruin the point of precompiling? Is my only option to dump these regexps into a hash or make the global etc?

I want to optimise this code because I'm processing files with roughly 9 million lines in them (and taking 3-4 days to do each one). We've done a lot of optimisation already.

Perhaps I should inline the increment_counters function.. any ideas on whether that would help? I'm spending 48.7% of my time in this function.

jarich

Edit ar0n -- Added a ReadMore tag

Replies are listed 'Best First'.
Re: Precompiled Reg Exps
by perlplexer (Hermit) on Apr 26, 2002 at 00:38 UTC
    You are correct in saying that precompilation will go to waste if you simply do this
    #... my $re = qr/blah/; if (/$re/){ # ... }
    You need to precompile every single one of them before the main loop. If it makes sense (if you have too many if-else conditions), then, like you said, make a dispatch table.
    Also, instead of trying to optimize your regexes, see if you can get rid of them (or at least some). In many cases people use regexes for everything, even when a simple 'eq' will suffice; e.g., if($bar =~ /^foo$/){} instead of if($bar eq 'foo'){}.

    --perlplexer
Re: Precompiled Reg Exps
by Fletch (Bishop) on Apr 26, 2002 at 02:26 UTC

    Could you give a bit more details about just what those /..../ bits really are?

    • Are they constant? (i.e. /abcd/, not containing any interpolated variables; if so then they'll be compiled once and you don't have anything to worry about (just covering that base because of the vagueness))
    • Do they depend on something determinied at runtime, but that doesn't vary from line to line? If so, then compile them once with a $res = prepare_regexen() that populates a hash; then use if( m/$res->{CONDITION_ONE} / ) { ... }
    • Do they vary from line to line? In this case you're probably screwed, but you might can make gains by factoring out common subexpressions and precompiling those.
    • Consider whether you really need regular expressions, or if you can replace some with index instead.

Re: Precompiled Reg Exps
by pdcawley (Hermit) on Apr 26, 2002 at 06:20 UTC
    What are you optimizing for?
    Speed
    Fix the algorithm.
    Readability
    Personally, I always recommend optimizing for readability first. Usually readable code is the kind of code that produces useful profiler stats. Which in turn helps you make things faster.
    In this case you've really not said much. What do the data look like? How often do those regular expressions change? (and if they don't change you're barking up the wrong tree.) Are the 'logging' regexes the 'same' as those used in the main loop? If not, how similar are they?

    Without facts, trying to optimize something is akin to answering the question "Can a piece of string be shortened?". The only possible answer is "Probably, yes." Optimization is, at least in part, the process of using the facts of a particular case to make the code 'fit' it better. Without good measurements all you get is off the peg code that fits where it touches.