http://qs1969.pair.com?node_id=1008122


in reply to alternation in regexes: to use or to avoid?

It's not clear to me what you are trying to achieve with your regex.

The simple grouping that look like this

aol\s*\w+|aachen\s*\w+|aaliyah\s*\w+|.....

runs quickly, it's only the one with lots of capture groups that is slow. i.e

(aol)\s*\w+|(aachen)\s*\w+|(aaliyah)\s*\w+|....

So maybe there's just a better way to get the result you want, if you'd care to explain what that is?

Replies are listed 'Best First'.
Re^2: alternation in regexes: to use or to avoid?
by balker (Novice) on Dec 10, 2012 at 15:51 UTC

    (I work with dk.)

    Another question could be: why is the one with the capture groups so slow, since none of the words match the string?

    And in general, why is alternation&capture so much slower than looping&capture + alternation combined?

    The reason for the code is to replace code with 60 or so similarly structured regexes in a library used by a couple of legacy applications with an automatically generated regex generated with info from configuration files, for both (potential) performance gains, allowing different behaviour across applications, and definite maintainability gains. The strings replaced all have the structure \bFOO:\s*bar(\d+) or \bBAZ:\s*(\w+) etc.

    Suggestions like "Well, don't do that" are likely to go unheard :-)

      OK then, If you want to use a non-optimal solution for operational reasons, go right ahead :-)

Re^2: alternation in regexes: to use or to avoid?
by dk (Chaplain) on Dec 10, 2012 at 16:07 UTC
    Added to balker's response, it's not that we're trying to achieve, we know other means how to get where we want to, but it's about the principle I've long nourished, (see Anastasius's quote above), and now it doesn't hold water. What i'd love to see, an explanation of someone who knows why regex algorithm exhibits behavior that is CONTRARY to perl lore.