in reply to Re^5: This regexp made simpler
in thread This regexp made simpler

I prefer a correct solution that contains some repetition to an incorrect solution any time. There are ways to make .*? work correctly, but ...
I agree that an incorrect solution doesn't make sense, but is it incorrect? .*?Zwould match the shortest possible sequence of characters up to, but not including, the letter Z. Could you give an example where this would fail?

-- 
Ronald Fischer <ynnor@mm.st>

Replies are listed 'Best First'.
Re^7: This regexp made simpler
by moritz (Cardinal) on Apr 26, 2010 at 11:36 UTC
    I haven't really thought about whether the solutions with .*? are actually incorrect, but most of them will almost certainly go wrong if you extend the regex latern on with something that might force backtracking on the preceeding construct.

    Example:

    $ perl -wE 'say "yes" if "A BCZD ZA" =~ /^A(\s.*?)?ZA/' yes
    Here I added an A to the end, which causes backtracking when there's no A after the first Z. Which in turn allows a match that was forbidden by your rules.

    (Update: This is a general problem when translating "may not occur inbetween" to "minimum match": it's only the same under certain very fixed conditions. You can "rescue" such a solution by putting it in (?>...) non-backtracking groups, but I still recommend against it).

    So maybe your example wasn't actually wrong (and I apologize for having called it so without any proof), but it's surely not very maintainable, because a very simple, innocent change can break it.

      I see the problem. Thanks for pointing this out!

      -- 
      Ronald Fischer <ynnor@mm.st>
      To satisfy my curiosity I wrote the backtracking control version you mentioned and I was surprised how simple it is:
      while (<DATA>) { print; s[ ^ (START) ( | \s.*? ) (END) (*COMMIT) $ ] [ $1 . $2 . 'insert' . $3 ]ex; print; } __DATA__ STARTEND STARTENDEND START SOMETHING END STARTSOMETHINGEND START END START ENDEND STARTSTARTENDEND STARTSTART ENDEND

      For comparison here's the other form:

      s[ ^ (START) ( | \s (?:(?!END).)* ) (END) $ ] [ $1 . $2 . 'insert' . $3 ]ex;