The results mostly show that you are concentrating on the wrong things.

Even "69971/s" vs. "130236/s", although conceptually "twice as fast", is unlikely to have other than trivial impact on the time required to run a useful script. Things that run that quickly are just hard to make add up to much of the run time of a useful script (in particular because Benchmark.pm "subtracts overhead" which makes such measurements rather distant from practical reality).

So, you've done quite a bit of work to determine that, if you write a script that does something useful that still mostly consists of trimming whitespace from even tons of strings of about that length, then which method you chose won't have much impact.

If you re-run the benchmark with much, much longer strings, then you'll likely get results that have more practical meaning (as the "overhead" is relatively less significant so the misguided "subtraction" process is unlikely to drastically change the results).

This makes sense as the usual way that you notice that a regex is excessively inefficient is because you ran it against particularly long strings, not because you ran it against a huge number of very short strings, again, because the overhead tends to make the inefficiency have a much smaller impact on the total run time. But also because one of the biggest ways to make a regex very inefficient is "bad backtracking" which scales linearly with the number of string but can scale O(n**2) or much worse with the string length.

But if you aren't going to be running these regexes against really long strings, then none of that actually matters.

If you are actually trying to make some script run faster, then, instead of picking random small operations and trying dozens of variations hoping to find "the fastest", you should profile the code to see what (if anything) is actually taking up any significant part of the run time. And/or looking at the larger structure, where optimizations actually have a chance of having significant impact on the total run time.

If you are just hoping to find "the best way" with Benchmark.pm results being a significant part of your grading process, then I also think you are paying too much attention to the wrong things. A construct that is 5% faster to write or understand or get correct is likely to have a much bigger impact than whether the computer can run it 20% faster, IME, especially since such constructs so rarely add up to more than a small part of the total run time.

Also, lots of times there is no one "fastest" approach. For example, it is pretty common for method X to be faster than method Y when a string has tons of extra whitespace while method Y is faster when the string has no extra whitespace.

So, if I were bored, I'd have tried a variation that does nothing when nothing needs to be done, like: s/ {2,}/ /g. I'm sure you can come up with at least half a dozen variations on that to add to your quest. :)

- tye        


In reply to Re: Question about regex performance (too small) by tye
in thread Question about regex performance by ted.byers

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.