dargosch has asked for the wisdom of the Perl Monks concerning the following question:

Greetings!



I'm searching hte Net (not the .NET) for information about how to optimize my Perl code. So far I've only found one articel called "Optimizing Your Perl" by Robert Spier on perl.com, which basically cites basic complexity theory.

What I'm looking fore, however, is something like "use the /o modifier in pattern matching"..
Any ideas? My project involves about 20 matchings each on around 10,000 formated text files. The gathered information is stored in about 240,000 objects (hashes). Right now, collecting this information takes 8 seconds.
This is what I would like to reduce.

Replies are listed 'Best First'.
Re: Optimizing Perl?
by broquaint (Abbot) on Apr 19, 2002 at 15:25 UTC
    use Disclaimer qw(optimization is probably a waste of your time);
    If you're looking at the same string over and over again you could use the study() function to optimise the regex searches.
    Make sure you're using the right tool for the right job i.e use index() instead of regex if you're just checking for sub strings within strings.
    Try avoid using saw-ampersand ($&) as it can slow down the execution of your code.
    HTH

    broquaint

      As I recall, $& no longer causes the slowdown it used to, and now only it's brother and sister had that problem.

      Cheers,
      Erik
Re: Optimizing Perl?
by strat (Canon) on Apr 19, 2002 at 15:46 UTC
    Well, there is not much I can contribute... because in my eyes, optimization is best done by using better algorithmns and datastructures...

    Before optimisation, it might help quite a bit finding what to optimize. Maybe the Module Devel::DProf might help you finding which subs are called how often and how much time it did. Then you could decide if there's a better algorithm, or if it's worth implementing this sub in C or the like or even using external programs which perform better for a certain problem. Or it is possible to run Tasks parallel (maybe on a multi-cpu-machine)?

    $reverseString = reverse($string); instead of: $reverseString = join("", reverse split(//, $string);
    Try to use good sorting algorithms, e.g. the Schwartzian Transform (or Guttman Rosler Transform) or the Orcish Maneuvre.
    Try not to use tie's.
    Try caching data
    Try eq instead of =~ /^...$/
    Try not to use $&, $` and $´

    My basic feelings about optimization are: buy a faster machine, because it often is cheaper than optimisation :-)

    Best regards,
    perl -le "s==*F=e=>y~\*martinF~stronat~=>s~[^\w]~~g=>chop,print"

Re: Optimizing Perl?
by perlplexer (Hermit) on Apr 19, 2002 at 15:37 UTC
    Precompile your patters. Example:
    my $re = qr/this RegEx will be used 10,000 times/; open FH, "<file.dat" or die "Error : $!\n"; while (<FH>){ process() if /$re/; } close FH;
    You can refer to perldoc perlretut for additional information.

    --perlplexer
Re: Optimizing Perl?
by samtregar (Abbot) on Apr 19, 2002 at 18:46 UTC
    Run, don't walk, to your nearest bookstore and buy a copy of "Mastering Regular Expressions" from O'Reilly press. It completely covers regex optimization. It's also probably the best technical book I've ever read. The depth of coverage is simply amazing.

    -sam

Re: Optimizing Perl?
by talexb (Chancellor) on Apr 19, 2002 at 15:25 UTC
    8 seconds for that much work sounds pretty good to me. However..

    The /o option compiles a regular expression -- it tells the Perl compiler that the regexp is invariant (er, won't change during the life of the script). That saves some time. Have you benchmarked your script with and without that option?

    If Perl still isn't fast enough, I suppose rewriting your script in either C or assembler is your only hope (assuming that you are already running on the hairiest machine you can get).

    --t. alex

    "Nyahhh (munch, munch) What's up, Doc?" --Bugs Bunny

    Update Thanks to crazyinsomniac for the reminder that the /o option is buggy for some versions of Perl 5.6.

Re: Optimizing Perl?
by cjf (Parson) on Apr 19, 2002 at 16:54 UTC