in reply to Re^2: A NOT in regular expressions (why [^%>]?)
in thread A NOT in regular expressions

I was looking for an old node of mine, and I came across this node again. I was bored, so I decided to work it out. First, the flow:

http://jryan.perlmonk.org/images/uloop.gif

A green node in this case means "that char", and a red node means "anything but that char". Green lines mean "yes", Red lines mean "no." So, we can directly translate that into the code:

m[ < % # node 0 -> node 1 -> ( # node 0 -> node 1 -> node 2 (?) -> % # node 0 -> node 1 -> node 2 (yes) -> ( # node 0 -> node 1 -> node 2 (yes) -> node 4 (?) -> [^>] # node 0 -> node 1 -> node 2 (yes) -> node 4 (no) - +> | [^%] # node 0 -> node 1 -> node 2 (yes) -> node 4 (yes) +-> # node 3 (?) -> ) | [^%] # node 0 -> node 1 -> node 2 (no) -> node 3 (?) -> )* # (%)+ # node 5 (?) > # node 5 (no) -> node 6 ]x

And, Perl lets us condense that into:

m[ < % (?: [^%]+ | % [^%>] )* %* # I left it as %* so the "insides" can be easily # grouped & captured % > ]x

So, your quick hack of a fix turns out to be the proper solution after all :)

Replies are listed 'Best First'.
Re^4: A NOT in regular expressions (thanks)
by tye (Sage) on Oct 23, 2003 at 18:00 UTC

    The first one fails by not stopping soon enough for "<% %%> %>". The second fails by not matching "<% %% %>".

    Thanks for the thoughts.

                    - tye

      Cribbing from Mastering Regular Expressions' section on removing C-style quotes ...

      qr{ <% [^%]* %+ ( [^>%] [^%]* %+ )* > }x;

      appears to properly handle every example in this thread, as well as not taking forever on a failed match.


      Remember, when you stare long into the abyss, you could have been home eating ice cream.