in reply to Re^5: Regex infinite loop?
in thread Regex infinite loop?

I'm hoping you can help me understand a couple of things. First, I thought that if I used the "?" I got non-greedy searching. Shouldn't that be helping things out?

Also, I'm trying to understand the code that you have suggested. Within the capturing parentheses you have [^"]++. What does this mean exactly? I read it as match anything that's not a double quote. I'm sure that I am wrong. Also, I don't know what the ++ does. I have only ever used one + to indicate "match at least once". What does the double plus, ++, mean? How would I say (in English) what is going on with [^"]++?

Replies are listed 'Best First'.
Re^7: Regex infinite loop?
by JavaFan (Canon) on Oct 17, 2008 at 15:58 UTC
    First, I thought that if I used the "?" I got non-greedy searching. Shouldn't that be helping things out?
    You are right about the first part. The secondary ? quantifier makes the match non-greedy. But greedy/non-greedy only makes a (possible) difference if there is a match. It will not change the fact whether something will or will not match. And while there might be a difference in performance when there is a match (it could go either way, but Friedl suggests that using ? is slower in most cases), there will usually not be much of a performance difference if there's no match. Perl will try all possible lengths before giving up, and it hardly matters when starting from longest match working towards shortest or starting with shortest working up to longest.
    Within the capturing parentheses you have [^"]++. What does this mean exactly?
    It means, match as many characters that aren't double quotes, and once you've found that many, do not try with less characters if the regexp engine backtracks to this point. The not "giving back" characters is the meaning of the second +, and was introduced in 5.10. The reason I used it here is that in:
    something[^"]++"
    once the regexp engine has matched 'something', a string of non-double quotes and then a double quote, and the rest of the pattern fails, and hence, the engine backtracks to the matching of the string of non-double quote characters, it's pointless to try it with one character less in the string of non-double quote characters: after all, the next character has to be a double quote.

      Thank you. This helps my understanding tremendously.

      Wow, I made the change that you suggested and it is super fast. The jets and colts now complete in about 1-2 seconds rather than 1-2 hours. COOL! Thank you!