http://qs1969.pair.com?node_id=1008113


in reply to alternation in regexes: to use or to avoid?

Perhaps the following quote from the Camel Book will shed some light on this question:

Short-circuit alternation is often faster than the corresponding regex. So:

print if /one-hump/ || /two/;

is likely to be faster than:

print if /one-hump|two/;

at least for certain values of one-hump and two. This is because the optimizer likes to hoist certain simple matching operations up into higher parts of the syntax tree and do very fast matching with a Boyer-Moore algorithm. A complicated pattern tends to defeat this.
— Tom Christiansen, brian d foy & Larry Wall with Jon Orwant, Programming Perl (4th Edition, 2012), p. 692.

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: alternation in regexes: to use or to avoid?
by dk (Chaplain) on Dec 10, 2012 at 15:14 UTC
    Not really, because it says:

    A complicated pattern tends to defeat this.

    and i'm seeing exactly the opposite. I wish Tom would comment on that :) But thank you for the quote, it helps with understanding why I think that the observed behavior is bad.

      Perhaps read "complicated" as "non-trivial", EG: having alternations

        Please read the benchmark numbers.

        Alternation is MUCH faster than looping over trivial regexes, except when you use captures inside the alternations.