in reply to Re: initializing internal regex variables?
in thread initializing internal regex variables?

Your comment appears to be what I needed to hear as changing the non-greedy match from /".*?"/ to /"[^"]*?"/ appears to work correctly. The negated character class was the trick.

I'm still a bit confused about why there is such a difference in what is matched, but I'm think about it some more.

Thanks for pointing me in the right direction.

Replies are listed 'Best First'.
Re^3: initializing internal regex variables?
by ikegami (Patriarch) on Nov 17, 2009 at 20:22 UTC
    I think you expect
    perl -e' $_ = qq{...\n} .qq{<a href="foo">foo</a>\n} .qq{<a href="bar">bar</a>\n} .qq{...\n}; s!(<a href=")(.*?)(">bar</a>)!$1\[$2]$3!s; print; '
    to output
    ... <a href="foo">foo</a> <a href="[bar]">bar</a> ...
    but that's wrong. It outputs
    ... <a href="[foo">foo</a> <a href="bar]">bar</a> ...

    The pattern says to match

    • Match the start of the string,
      • followed by as few characters as possible (implicit leading /.*?/),
        • followed by the string '<a href="',
          • followed by as few characters as possible,
            • followed by the string '">bar</a>'.

    Keeping in mind that "as few characters as possible" is zero characters, let's check if the string matches:

    • Starting at the begining of the string,
      • Do 0 characters follow? Yes, so try to match the next atom.
        • Does the string '<a href="' follow? No, so backtrack.
      • Does 1 character follow? Yes, so try to match the next atom.
        • Does the string '<a href="' follow? No, so backtrack.
      • Do 2 characters follow? Yes, so try to match the next atom.
        • Does the string '<a href="' follow? No, so backtrack.
      • ...
      • Do 4 characters follow? Yes, so try to match the next atom.
        • Does the string '<a href="' follow? Yes, so try to match the next atom.
          • Do 0 characters follow? Yes, so try to match the next atom.
            • Does the string '">bar</a>' follow? No, so backtrack.
          • Does 1 character follow? Yes, so try to match the next atom.
            • Does the string '">bar</a>' follow? No, so backtrack.
          • Do 2 characters follow? Yes, so try to match the next atom.
            • Does the string '">bar</a>' follow? No, so backtrack.
          • ...
          • Do 25 characters follow? Yes, so try to match the next atom.
            • Does the string '">bar</a>' follow? Yes, so try to match the next atom.
              • We have a match!