in reply to Re^4: This regexp made simpler
in thread This regexp made simpler

Huh.

I prefer a correct solution that contains some repetition to an incorrect solution any time. There are ways to make .*? work correctly, but they include a certain amount of backtracking control, which makes them harder to maintain.

This is true especially if Z happens not to be just a latter, but a more complicated pattern.

If that's the case, you should use interpolation anyway, and [^Z]+ is to be replaced by (?s:(?!$Z).)+

In general it does make a difference if Z is actually a single character or something else, and if it's something else that should be mentioned in the original question anyway.

Update: added missing quantifier

Perl 6 - links to (nearly) everything that is Perl 6.

Replies are listed 'Best First'.
Re^6: This regexp made simpler
by rovf (Priest) on Apr 26, 2010 at 11:16 UTC
    I prefer a correct solution that contains some repetition to an incorrect solution any time. There are ways to make .*? work correctly, but ...
    I agree that an incorrect solution doesn't make sense, but is it incorrect? .*?Zwould match the shortest possible sequence of characters up to, but not including, the letter Z. Could you give an example where this would fail?

    -- 
    Ronald Fischer <ynnor@mm.st>
      I haven't really thought about whether the solutions with .*? are actually incorrect, but most of them will almost certainly go wrong if you extend the regex latern on with something that might force backtracking on the preceeding construct.

      Example:

      $ perl -wE 'say "yes" if "A BCZD ZA" =~ /^A(\s.*?)?ZA/' yes
      Here I added an A to the end, which causes backtracking when there's no A after the first Z. Which in turn allows a match that was forbidden by your rules.

      (Update: This is a general problem when translating "may not occur inbetween" to "minimum match": it's only the same under certain very fixed conditions. You can "rescue" such a solution by putting it in (?>...) non-backtracking groups, but I still recommend against it).

      So maybe your example wasn't actually wrong (and I apologize for having called it so without any proof), but it's surely not very maintainable, because a very simple, innocent change can break it.

        I see the problem. Thanks for pointing this out!

        -- 
        Ronald Fischer <ynnor@mm.st>
        To satisfy my curiosity I wrote the backtracking control version you mentioned and I was surprised how simple it is:
        while (<DATA>) { print; s[ ^ (START) ( | \s.*? ) (END) (*COMMIT) $ ] [ $1 . $2 . 'insert' . $3 ]ex; print; } __DATA__ STARTEND STARTENDEND START SOMETHING END STARTSOMETHINGEND START END START ENDEND STARTSTARTENDEND STARTSTART ENDEND

        For comparison here's the other form:

        s[ ^ (START) ( | \s (?:(?!END).)* ) (END) $ ] [ $1 . $2 . 'insert' . $3 ]ex;