in reply to initializing internal regex variables?

it appears that disjoint combinations (not adjacent) table elements are colored instead.
Well, there's nothing in the regex that requires the rows to be adjacent. /".+?"/ will happily match billions and billions of rows if required.

Replies are listed 'Best First'.
Re^2: initializing internal regex variables?
by Anonymous Monk on Nov 17, 2009 at 19:44 UTC

    Your comment appears to be what I needed to hear as changing the non-greedy match from /".*?"/ to /"[^"]*?"/ appears to work correctly. The negated character class was the trick.

    I'm still a bit confused about why there is such a difference in what is matched, but I'm think about it some more.

    Thanks for pointing me in the right direction.

      I think you expect
      perl -e' $_ = qq{...\n} .qq{<a href="foo">foo</a>\n} .qq{<a href="bar">bar</a>\n} .qq{...\n}; s!(<a href=")(.*?)(">bar</a>)!$1\[$2]$3!s; print; '
      to output
      ... <a href="foo">foo</a> <a href="[bar]">bar</a> ...
      but that's wrong. It outputs
      ... <a href="[foo">foo</a> <a href="bar]">bar</a> ...

      The pattern says to match

      • Match the start of the string,
        • followed by as few characters as possible (implicit leading /.*?/),
          • followed by the string '<a href="',
            • followed by as few characters as possible,
              • followed by the string '">bar</a>'.

      Keeping in mind that "as few characters as possible" is zero characters, let's check if the string matches:

      • Starting at the begining of the string,
        • Do 0 characters follow? Yes, so try to match the next atom.
          • Does the string '<a href="' follow? No, so backtrack.
        • Does 1 character follow? Yes, so try to match the next atom.
          • Does the string '<a href="' follow? No, so backtrack.
        • Do 2 characters follow? Yes, so try to match the next atom.
          • Does the string '<a href="' follow? No, so backtrack.
        • ...
        • Do 4 characters follow? Yes, so try to match the next atom.
          • Does the string '<a href="' follow? Yes, so try to match the next atom.
            • Do 0 characters follow? Yes, so try to match the next atom.
              • Does the string '">bar</a>' follow? No, so backtrack.
            • Does 1 character follow? Yes, so try to match the next atom.
              • Does the string '">bar</a>' follow? No, so backtrack.
            • Do 2 characters follow? Yes, so try to match the next atom.
              • Does the string '">bar</a>' follow? No, so backtrack.
            • ...
            • Do 25 characters follow? Yes, so try to match the next atom.
              • Does the string '">bar</a>' follow? Yes, so try to match the next atom.
                • We have a match!
Re^2: initializing internal regex variables?
by Anonymous Monk on Nov 17, 2009 at 19:09 UTC
    Okay, maybe I'm reading more into your response than I should, but here are two questions:
    • is there any difference between /".+?"/ and /".*?"/? Yes, + matches one or more of the previous pattern, and * matches zero or more of the previous pattern, but given that all strings seen in the table are more than one character in length, is there any difference since I am specifying that the pattern is non-greedy?
    • is not the regular expression originally quoted non-greedy?
      • Thanks for any insight shared.
      is not the regular expression originally quoted non-greedy?
      It is. But what do you expect non-greedy to be? Some people think that non-greedy means "match an as short string as possible", without anything else. But there is just one such a string, and that's the empty string.

      Non-greedy does not mean, "don't match where you would match otherwise". If a pattern matches with greedy (sub) matches, it will match with non-greedy sub matches. And if a pattern doesn't match with non-greedy sub matches, it will not match with greedy sub matches.

      All greedy/non-greedy will do is change $&, it will not change whether or not a pattern matches.