in reply to Problem with regex from Learning Perl (3rd edition)

I don't have the book in front of me right now, but I think the problem is either in the way the question is written or your interpretation of it.

It's not quite correct to state that /"([^"]*)"/ should not match "hello"you". In fact, it should match -- specifically, it should match the "hello" part. By default, regexes are not anchored, and the match operator will return true if the pattern matchs any part of the bound string. If you want to anchor the regex -- in other words, if you want it to match something in its entirity -- you need to explicitly specify ^ and $ modifiers.

For example, the regex /^"([^"]*)"$/ will behave the way you seem to be expecting, because it means essentially "match a something enclosed by quotes, with nothing else before or after it.

The reason your one-liner is behaving the way it does is because you're not creating exactly the string you think. This statement:
$a = "hello\"you"

actually creates a string whose contents are hello"you with no quotes before and after -- because the quotes are used to delimit the string literal in the first place. If you tried this:

$a = "\"hello\"you\""

or (more readable) this:

$a = q{"hello"you"}

you'd get the result you expect.

(If you're not familiar with the q{xxx} syntax, it effectively means "single-quote xxx". qq{xxx} does the same with double-quotes. You can use a wide variety of delimiters where I've used curly brackets -- this is a great way to avoid having to escape characters inside string literals.)

Replies are listed 'Best First'.
Re: Re: Problem with regex from Learning Perl (3rd edition)
by slocate (Novice) on Mar 08, 2002 at 05:22 UTC
    The thing is that "hello"you" should not match because it has a " in the middle, which is what [^"] is taking care of. Forget, for now, about the surrounding double-quotes. I know I'm right, because if you use lets say /^"([^x]*)"$/ than "extra" won't match, but "estra" will. :-/

      Hopefully I can break this down in a little better explaination.


      /"([^"]*)"/
    • 1. [^"] is a negated character class. Which means 'any character except (")'.
    • 2. [^"]* Your regex wants 'zero or more (*)' of that negated character class.
    • 3. /"([^"]*) then your regex wants 1 double quote at the beginning of a match (not the line)
    • 4. /"([^"]*)"/ then your regex wants 1 double quote at the end of the match
    • Ok now lets break down what happens with your string "hello"you"

    • " Your first character. Your regex is happy it matches number 3
    • hello" Your regex clicks along happily down the next 5 non-double quotes till it sees ". Oops this breaks rule number 1.
    • hello Your regex moves back one click. It is happy again it matches number 2
    • "hello" Your regex moves forward one click to match number 4.
    • You regex is happy and matches all 4 of your rules

      If you apply this to your other regex /^"([^x]*)"$/ you can see why your examples work

      HTH

      grep
      grep> cd /pub
      grep> more beer
        I knew that...:p
      The thing is that "hello"you" should not match because it has a " in the middle, which is what [^"] is taking care of.

      I misread the question. If you're trying to reject strings with embedded quotes,

      "What happens," he asked, "with a string like this?"
      Do you reject this, or match "What happens," and "with a string like this?" ?

      You've bitten of a problem that gets harder the closer you look at it.

      grep gives a nice step-by-step explanation. It might also help if you take a moment to consider how this:
      /"([^"]*)"/
      differs from this:
      /^"([^"]*)"$/