I'm trying to optimise some code from hell, which of course I didn't write.

The basic structure of the code I'm working on is this:

while (<>) { ..... if(/...../) { ... } elsif(/...../) { ... } ..... # and a bunch more regexps increment_counters(...); ..... } sub increment_counters { .... if(/...../) { ..... next; # note that this nexts the above loop } if(/..../) { .... next; } ...... }

My questions are the following. None of the regular expressions here are precompiled. Is there a way to precompile them without creating more variables eg:

if( qr/..../ )
or rather something like that, as that won't work. Alternately, if I precompile the expressions as a variable:
my $date = qr/...../; if(m/$date/) { ..... }
will the precompilation disappear once the variable goes out of scope? I'm assuming it will but wouldn't that ruin the point of precompiling? Is my only option to dump these regexps into a hash or make the global etc?

I want to optimise this code because I'm processing files with roughly 9 million lines in them (and taking 3-4 days to do each one). We've done a lot of optimisation already.

Perhaps I should inline the increment_counters function.. any ideas on whether that would help? I'm spending 48.7% of my time in this function.

jarich

Edit ar0n -- Added a ReadMore tag


In reply to Precompiled Reg Exps by jarich

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.