note
merlyn
Aha! One of the classic mistakes was made on this code:
<blockquote>
<code>
$myvar =~ /" # First quote
( # Capture text to $1
(?: # Non-backreferencing parentheses
[^?"] # Anything that's not a question mark or quote
| # or
\?[^"] # A question mark not followed by a quote (to allow embedded question marks)
)* # Zero or more
) # End capture
\?"/x; # Followed by a question mark and quote
</code>
</blockquote>
Try this with
<code>
$myvar = q{ abc"def??"ghi?"jkl };
</code>
And you'll see that it matches the <code>ghi</code>, not <code>def??</code>.
The problem is that
the "question mark NOT followed by a quote" can sometimes eat up the question
mark that you need to begin your closing delimiter.
<p>
The proper way to tackle this is to "inch-along"...
<code>
$myvar = q{ abc"def??"ghi?"jkl };
print "matched <$1>" if
$myvar =~ /" # First quote
( # Capture text to $1
(?: # Non-backreferencing parentheses
(?!\?") # not question quote?
. # ok to inch along
)* # Zero or more
) # End capture
\?"/sx; # Followed by a question mark and quote
</code>
which properly prints:
<code>
matched <def?>
</code>
<p>
I was tackling this kind of thing a lot when people would keep posing the "how
do I match a C comment?" back in the early days Pre-Ilya-RE. I got pretty good
at breaking just about any regex that claimed to match a comment, by undoing
any assumption made.
<p>-- <a href="http://www.stonehenge.com/merlyn/">Randal L. Schwartz, Perl hacker</a></p>
24640
24640