Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: alternation in regexes: to use or to avoid?

by Athanasius (Archbishop)
on Dec 10, 2012 at 15:07 UTC ( #1008113=note: print w/replies, xml ) Need Help??

in reply to alternation in regexes: to use or to avoid?

Perhaps the following quote from the Camel Book will shed some light on this question:

Short-circuit alternation is often faster than the corresponding regex. So:

print if /one-hump/ || /two/;

is likely to be faster than:

print if /one-hump|two/;

at least for certain values of one-hump and two. This is because the optimizer likes to hoist certain simple matching operations up into higher parts of the syntax tree and do very fast matching with a Boyer-Moore algorithm. A complicated pattern tends to defeat this.
— Tom Christiansen, brian d foy & Larry Wall with Jon Orwant, Programming Perl (4th Edition, 2012), p. 692.

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: alternation in regexes: to use or to avoid?
by dk (Chaplain) on Dec 10, 2012 at 15:14 UTC
    Not really, because it says:

    A complicated pattern tends to defeat this.

    and i'm seeing exactly the opposite. I wish Tom would comment on that :) But thank you for the quote, it helps with understanding why I think that the observed behavior is bad.

      Perhaps read "complicated" as "non-trivial", EG: having alternations

        Please read the benchmark numbers.

        Alternation is MUCH faster than looping over trivial regexes, except when you use captures inside the alternations.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1008113]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2022-09-29 10:26 GMT
Find Nodes?
    Voting Booth?
    I prefer my indexes to start at:

    Results (125 votes). Check out past polls.