in reply to Benchmarking Tests

Benchmarks are relative. 'Relative to what?', you ask. Well, to the machine you are on and the data you are working with, and the script you are running.

Remember the Perl adage: There's more than one way to do it. That is the way you have to look at your scripts. What are the different subs doing? Can I rewrite them using some other method. Only by rewriting sections and benchmarking again can you get an idea of what works better or not. Bottom line is that unless you want to start reading the perl source code and seeing how it implements various operations (I sure don't, but some people out there do) then figuring out what works better or worse on your systems on a given dataset is going to be trial and error.

Areas of your scripts you should study hard to see if there is a more efficient way are:

  • regexes are notoriously easy to make overly complicated. Read Mastering Regular Expressions by Friedl to get a handle of the regex engine.
  • Loops that loop over large chunks of data. What are they doing to the data? Can you unroll the loop or do some of the data transforms more efficiently.
  • Check out Effective Perl Programming by Hall (with Schwartz:) for tons of wiz-bang stuff about efficiency in Perl.

    Essentially, what you want is to look at the code that does the hard tedious work with the most data, these are the sections that may be ripe for streamlining.