Re: Backtracking hurts: slow regexp

What hurts in your case are the nested quantifiers, ie (.*<\/td>){9}. You might use (?>.*?<\/td>){9} instead, it's probably much faster, and what it does is closer to what you think it does.

That said, use a proper HTML parsing module from CPAN, and extract your desired information from the parse tree.

Comment on Re: Backtracking hurts: slow regexp Select or Download Code

Replies are listed 'Best First'.
Re^2: Backtracking hurts: slow regexp by faibistes (Novice) on Dec 19, 2008 at 12:57 UTC
Sir, you're the man. Obviously you've understood perfectly what I was trying to achieve: parsing column-delimited HTML data. But I don't understand why is is your expression equivalent to mine. I'm a newbie at perl, and reading the perlreref I guessed that the solution might have been in `?>`, but to be honest, my head hurts when I try to make head or tails of it. Could you please try to explain, in the simpler possible way, how does `?>` work?	[reply] [d/l] [select]
Re^3: Backtracking hurts: slow regexp by massa (Hermit) on Dec 19, 2008 at 13:38 UTC
From the forementioned perlreref: `(?>...) Grab what we can, prohibit backtracking` [download] that's it. it does not allow backtracking. so, the `(?>.?<\/td>){9}`will get exactly 9* instances of (non-greedy) anything followed by `</td>`... it won't try to go till the end of the string chasing the longest `.*` (because it is not greedy) and if the last `.` of the sequence is not followed by `</td>`, it will fail without backtracking (working more or less as a deterministic automaton). []s, HTH, Massa (κς,πμ,πλ)	[reply] [d/l] [select]