Magius_AR has asked for the wisdom of the Perl Monks concerning the following question:

Why in god's name is $2 matching an e in this regex?
$word = "eee"; $word=~/(e)([^\1])e/;
By all forms of sanity, $2 should match anything that is NOT e, aka it should be undef in this example.

Can anyone tell me why this isn't working? It's had us stumped for days.

Magius_AR

Replies are listed 'Best First'.
Re: Mind boggling regex
by japhy (Canon) on Oct 26, 2001 at 18:01 UTC
    Character classes are formed at regex compile-time, and \1 isn't a backreference until regex run-time, so your character class is saying "all characters except the character with octal code 001". Try something like:
    $word =~ /(e)(?!\1)(.)e/s;
    which reads "match an e, then, making sure we CAN'T match an e, match ANY character, then match an e." The /s is there so that . matches newline.

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      $word =~ /(e)(?!\1)(.)e/s;
      I just tried that regex, and $1, $2, and $3 are all undefined.

      Does the look-ahead somehow make all the saved regex matches disappear or something?

      Magius_AR

        Well, the regex FAILS for the string "eee". Try "eye" instead. And this regex only defines $1 and $2.

        _____________________________________________________
        Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
        s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: Mind boggling regex
by davis (Vicar) on Oct 26, 2001 at 18:08 UTC
    The problem lies in that you're assuming that the \1 is going to contain the contents of the first set of parentheses. It won't.
    In a character class, metacharacters etc take on a different meaning. I believe in this case the second set of parantheses is attempting to match 'any character that doesn't have octal code 001'
    Remember, metacharacters take on a different meaning inside character classes.
    I'm just off now to see if I can figure out an elegant solution to this one.
    Update: 4 seconds later -- as ever, others have answered before me. japhy's solution is classier than I can attempt
    /me wonders off to look up lookahead assertions properly.
Re: Mind boggling regex
by earthboundmisfit (Chaplain) on Oct 26, 2001 at 18:05 UTC
    My uninformed guess is that it has to do with look ahead. If you hard code the 'e' it behaves as it should:
    $word=~/(e)([^e])e/;
    My understanding was that you could not use a reference to the matched result until after the pattern has matched.

    I eagerly await the real answer from those who know

    update: didn't have to wait long for that one. Should have practiced some look behind ;)

    update2: As usual, I've confused my terminology. In studying japhy's answer it looks like I had the right idea, but really what it comes down to is that ([^\1]) is the wrong way to do a negative lookahead Pg. 203 of the Camel Book (3rd edition) offers a good explanation of this.