slocate has asked for the wisdom of the Perl Monks concerning the following question:

On exercise 8.2 from Learning Perl 3rd there is this regex:   /"([^"]*)"/ This is supposed to match a simple double-quoted string like "hello", but not "hello"you" (nor "hello\"you"). The only problem is that it is matching the latters. I tried it like this:   /"([^a]*)"/ and well enough, it wouldn't match any word that contained an a in it (ie: "wilma"), so what am I'm missing? PS: I should add that one-liner does work:   perl -wle '$a = "hello\"you";if ($a =~ /"([^"]*)"/){ print "yes"}else{print "no"}' Just not on the script which adds to the weirdness of it...

Edit by dws for code tags and clearer title

Replies are listed 'Best First'.
Re: Problem with regex from Learning Perl (3rd edition)
by seattlejohn (Deacon) on Mar 08, 2002 at 04:59 UTC
    I don't have the book in front of me right now, but I think the problem is either in the way the question is written or your interpretation of it.

    It's not quite correct to state that /"([^"]*)"/ should not match "hello"you". In fact, it should match -- specifically, it should match the "hello" part. By default, regexes are not anchored, and the match operator will return true if the pattern matchs any part of the bound string. If you want to anchor the regex -- in other words, if you want it to match something in its entirity -- you need to explicitly specify ^ and $ modifiers.

    For example, the regex /^"([^"]*)"$/ will behave the way you seem to be expecting, because it means essentially "match a something enclosed by quotes, with nothing else before or after it.

    The reason your one-liner is behaving the way it does is because you're not creating exactly the string you think. This statement:
    $a = "hello\"you"

    actually creates a string whose contents are hello"you with no quotes before and after -- because the quotes are used to delimit the string literal in the first place. If you tried this:

    $a = "\"hello\"you\""

    or (more readable) this:

    $a = q{"hello"you"}

    you'd get the result you expect.

    (If you're not familiar with the q{xxx} syntax, it effectively means "single-quote xxx". qq{xxx} does the same with double-quotes. You can use a wide variety of delimiters where I've used curly brackets -- this is a great way to avoid having to escape characters inside string literals.)

      The thing is that "hello"you" should not match because it has a " in the middle, which is what [^"] is taking care of. Forget, for now, about the surrounding double-quotes. I know I'm right, because if you use lets say /^"([^x]*)"$/ than "extra" won't match, but "estra" will. :-/

        Hopefully I can break this down in a little better explaination.


        /"([^"]*)"/
      • 1. [^"] is a negated character class. Which means 'any character except (")'.
      • 2. [^"]* Your regex wants 'zero or more (*)' of that negated character class.
      • 3. /"([^"]*) then your regex wants 1 double quote at the beginning of a match (not the line)
      • 4. /"([^"]*)"/ then your regex wants 1 double quote at the end of the match
      • Ok now lets break down what happens with your string "hello"you"

      • " Your first character. Your regex is happy it matches number 3
      • hello" Your regex clicks along happily down the next 5 non-double quotes till it sees ". Oops this breaks rule number 1.
      • hello Your regex moves back one click. It is happy again it matches number 2
      • "hello" Your regex moves forward one click to match number 4.
      • You regex is happy and matches all 4 of your rules

        If you apply this to your other regex /^"([^x]*)"$/ you can see why your examples work

        HTH

        grep
        grep> cd /pub
        grep> more beer
        The thing is that "hello"you" should not match because it has a " in the middle, which is what [^"] is taking care of.

        I misread the question. If you're trying to reject strings with embedded quotes,

        "What happens," he asked, "with a string like this?"
        Do you reject this, or match "What happens," and "with a string like this?" ?

        You've bitten of a problem that gets harder the closer you look at it.

        grep gives a nice step-by-step explanation. It might also help if you take a moment to consider how this:
        /"([^"]*)"/
        differs from this:
        /^"([^"]*)"$/
Re: Problem with regex from Learning Perl (3rd edition)
by dws (Chancellor) on Mar 08, 2002 at 04:12 UTC
    If you want to match the "hello" part of "hello"you", then you need to cause the regex to not be greedy. Try this   /"([^"]*?)"/
      The thing is that the regex (/"([^"]*?)"/) is *not* supposed to match for instance "this"string", as it is, quotes and all. On the other hand, "string" should always match... This is the regex in action...
      #!/usr/bin/perl -w while (<>) { chomp; if (/"([^"]*)"/) { print "Matched: |<$&>|\n"; } else { print "No Match.\n"; } }
        It IS supposed to match.

        q/"this"string"/ =~ m{ " # X ( [^"]*? # XXXX ) " # X }x

        Remember why "foo" =~ /o/? It's because there has to be an o in there, but it's location in the string has not been fixed in the regex. Anchoring avoids this problem. ^ matches the null string at the beginning of a line or string, $ matches the null string at the end of a line (right before \n) or string.

        "foo" !~ /^o$/; q/"this"string"/ !~ /^"([^"]*?)"$/; # X X

        ++ vs lbh qrpbqrq guvf hfvat n ge va Crey :)
        Nabgure bar vs lbh qvq fb jvgubhg ernqvat n znahny svefg.
        -- vs lbh hfrq OFQ pnrfne ;)
            - Whreq