in reply to Re: Backtracking hurts: slow regexp
in thread Backtracking hurts: slow regexp

Sir, you're the man. Obviously you've understood perfectly what I was trying to achieve: parsing column-delimited HTML data. But I don't understand why is is your expression equivalent to mine. I'm a newbie at perl, and reading the perlreref I guessed that the solution might have been in ?>, but to be honest, my head hurts when I try to make head or tails of it. Could you please try to explain, in the simpler possible way, how does ?> work?

Replies are listed 'Best First'.
Re^3: Backtracking hurts: slow regexp
by massa (Hermit) on Dec 19, 2008 at 13:38 UTC
    From the forementioned perlreref:
    (?>...) Grab what we can, prohibit backtracking
    that's it. it does not allow backtracking. so, the (?>.*?<\/td>){9}will get exactly 9 instances of (non-greedy) anything followed by </td>... it won't try to go till the end of the string chasing the longest .* (because it is not greedy) and if the last . of the sequence is not followed by </td>, it will fail without backtracking (working more or less as a deterministic automaton).
    []s, HTH, Massa (κς,πμ,πλ)