(3) will speed up if you replace "(?:.|\n)" with something simpler such as "[\s\S]", or best, add the "s" option to your regex and just use ".".

(1) is slower than (2) because (1) has to dispatch a lot more regex opcodes. That is, (2) can just hang out in the "[^<]*" opcode while it gobbles quite a few characters while (1) has to leave "[^<]" for each character (leaving the alternation/parens to move to the "*" then come back in through the alternation/parens to get back to the "[^<]").

You are correct (to my understanding) about the disadvantage of (3). But (3) has an advantage in that it is simpler than (1) and (2).

According to ZZamboni's benchmarks, (3)'s disadvantage is only slightly greater than its advantage and slight compared to (1)'s disadvantage. But I suspect this is all rather dependant on the input used in the benchmarks. In particular, the length of the text to be matched and the frequency of <span> tag pairs within it will affect the relative performance characteristics.

As for other alternatives, I see no reason to not simplify things greatly to:

s#<div class="blockqte">#<pre>#g; s#</div>#</pre>#g;
and not force the regex engine to match the intervening text at all.

Finally, many regex libraries are built around deterministic finite state automata, which means that different regexes run at much closer to the same speed (though more complex ones take longer to compile), but memory consumption can grow out of bounds. So Java's regex performance characteristics could be drastically different than Perl's.

        - tye (but my friends call me "Tye")

In reply to (tye)Re: Help on using alternation grouping star versus dot star. by tye
in thread Help on using alternation grouping star versus dot star. by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.