"Benchmarked variations include some of those used by kcott"

I'm assuming you're referring to cg_ncg with (?: ... ) and cg_atomic with (?> ... ).

Prior to posting yesterday, and purely out of curiousity, I ran /atg(.+?)(?:taa|tag|tga)/ and /atg(.+?)(?>taa|tag|tga)/ through Regexp::Debugger looking at the matching process step-by-step. From memory, ?: took 64 steps (in total) to complete the match while ?> took 66 steps. That probably accounts for the cg_atomic vs. cg_ncg 3% (66/64 = 1.03125).

Again from memory, the two extra steps occurred after failing to match taa|tag|tga after either the 'a' or 't' of 'atg'. For the ?: case, the steps were something like: "(?:" start non-capture group; "taa" no match; "|" next alt; ...; "tga" no match. For the ?> case: "(?>" start non-backtracking group; ... as for ?: ...; (then the additional step) ")" end non-backtracking group.

Obviously, you can check that yourself if you're so inclined. I wasn't inclined to repeat the process. :-)

[I haven't analysed your benchmarking further.]

-- Ken


In reply to Re^4: Simple regex question. Grouping with a negative lookahead assertion. by kcott
in thread Simple regex question. Grouping with a negative lookahead assertion. by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.