What hurts in your case are the nested quantifiers, ie (.*<\/td>){9}. You might use (?>.*?<\/td>){9} instead, it's probably much faster, and what it does is closer to what you think it does.
That said, use a proper HTML parsing module from CPAN, and extract your desired information from the parse tree. | [reply] [d/l] [select] |
Sir, you're the man. Obviously you've understood perfectly what I was trying to achieve: parsing column-delimited HTML data. But I don't understand why is is your expression equivalent to mine. I'm a newbie at perl, and reading the perlreref I guessed that the solution might have been in ?>, but to be honest, my head hurts when I try to make head or tails of it. Could you please try to explain, in the simpler possible way, how does ?> work?
| [reply] [d/l] [select] |
(?>...) Grab what we can, prohibit backtracking
that's it. it does not allow backtracking. so, the (?>.*?<\/td>){9}will get exactly 9 instances of (non-greedy) anything followed by </td>... it won't try to go till the end of the string chasing the longest .* (because it is not greedy) and if the last . of the sequence is not followed by </td>, it will fail without backtracking (working more or less as a deterministic automaton).
[]s, HTH, Massa (κς,πμ,πλ)
| [reply] [d/l] [select] |
Perl uses internally an engine that's based on NFA. It is possible to change this into a DFA, but that would require you to write a different regexp engine - all perl currently provides is a mechanism to hook in a different regexp engine.
As for optimizing your regexp, I do not know if that's possible. I do not know what the regexp is supposed to do. Sure, I can read the regexp, and I know what it does, but if I change anything about it, it will do something else. Which may still do the task you want it to do, but since you don't tell us, I'm not going to guess.
Tell us, what's the data you are applying it against, and what you want to match? | [reply] |