Someone asked privately if the % in [^%>] was required.
That is a good question, so I decided to answer it in public.
Without that %, we get:
m[
<% # Opening delimiter.
(?: # Match stuff that isn't a closing delim:
[^%]+ # Things that can't start one.
| %+[^>] # Might start one but isn't one.
)* # As many non-closing-delims as you like.
%> # Closing delimiter.
]x
which will match as follows:
"<% %%> %>"
"<%" matches <%
" " matches [^%]+ so (?: ... )* has matched once
"%%" matches %+
">" fails on [^>] so we back-track
"%" now matches %+
"%" matches [^>] so (?: ... )* has matched twice
"> " matches [^%]+ so (?: ... )* has matched 3 times
"%>" matches %> so regex finishes
so we've matched the whole string when we should have only matched the
first part, "<% %%>".
By leaving the % out of [^%>], we've allowed the regex to
back-track and match the first character of our delimiter (%) as the tail
end of %+[^>].
But I now realize that my regex is also broken because it will never match:
"<% %%>"
at all. I'm tempted to fix it with:
m[
<% # Opening delimiter.
(?: # Match stuff that isn't a closing delim:
[^%]+ # Things that can't start one.
| %+[^%>] # Might start one but isn't one.
)* # As many non-closing-delims as you like.
%* # PUNT!
%> # Closing delimiter.
]x
but that seems wrong. Think...
Bah, I'm hours later for bed already. Serves my right for "showing off"
"unrolling the loop" when I've seen *so many* *really good* regex slingers
get this wrong more than once. (:
Unlike the last time I saw this happen, these nodes will *not* be updated to hide the mistakes I've made (that last time the updates were flying really fast and I was extremely frustrated by not being able to learn from the repeated mistakes).
- tye
|