I was quite pleasantly surprised its speed. I notice, however, that the code runs about three times more slowly under 5.30 than under 5.8. I assume this is due to the many modifications made to the regex engine over the years to accommodate Unicode. Any comment on this would be of interest.

3x slower seems an awful lot. Are both perls built with the same options?

Unicode support could be part of it: it would be worth redoing the timings under 5.30 supplying either /a or /aa as a flag on the regexp. However for these patterns (ASCII input and pattern, no use of \w-style classes) I'd expect the cost between versions to be low, and the benefit of the /a flags to be zero.

There have also been numerous small changes added in recent years as a result of bugs (mostly found by fuzzers) that had the potential to be security holes. Such changes almost always make things a tiny bit slower, and as you accumulate more and more of them they add up. So it might also be informative to run the regexp tests from 5.30 under 5.8 (which might need some adaptation) to see a sample of the bugs 5.30 doesn't have.

If you're doing timings, I'd also be interested how my code from 11146164 compares - I'd expect it to win a lot by avoiding the embedded code block, and give back a fraction of that by doing more backtracking. (You should add at least the /s flag to my qr{} for proper comparison.)

# (what is proper match behavior of empty template?)

Since we're assuming implicit anchors, I think it should just match the empty string. That wasn't a case I attempted to handle though, and I doubt the OP cares about it.


In reply to Re^5: Nonrepeating characters in an RE (performance) by hv
in thread Nonrepeating characters in an RE by BernieC

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.