throop has asked for the wisdom of the Perl Monks concerning the following question:

Brethren

The (?! will perform a zero-width negative lookahead. But it captures its arg. For example, go in the debugger and let's trim a string up to the first occurance of two consecutive blank lines:

DB<1> $foo = "a\nb\nc\n\nd\ne\n\n\nf\ng\nh\n\ni\n\n\nj\n" DB<2> x $foo =~ /( (:? . | \n (?! \n\n) )+ ) /x 0 'a b c d e' 1 'e'
The regex match returned two args. The first is what we wanted (the text up until the two consecutive blank lines) but then we pick up an 'e' from the last match of the (?! \n\n). How can we keep the (?! from capturing?

throop

Replies are listed 'Best First'.
Re: Non-capturing zero-width negative lookahead
by Sidhekin (Priest) on Mar 22, 2007 at 16:31 UTC

    x $foo =~ /( (:? . | \n (?! \n\n) )+ ) /x

    You have a bug: (:? should be (?:. That's what ends up as your second capturing group.

    print "Just another Perl ${\(trickster and hacker)},"
    The Sidhekin proves Sidhe did it!

Re: Non-capturing zero-width negative lookahead
by varian (Chaplain) on Mar 22, 2007 at 16:43 UTC
    The (?! is noncapturing as is.

    However you have included a (:?, the colon depicts an (unrecognized) regexp extensionis not a metacharacter so this is treated as a regular capturing group.

    Updated: clarified my statement, Sidhekin is right

      the colon depicts an (unrecognized) regexp extension

      No, it doesn't. The colon is a literally matching atom, and the question mark is a quantifier. With some hopefully clearer /x spacing:

      $foo =~ /( ( :? . # 0 or 1 colon + 1 non-newline char | # or \n (?! \n\n) # 1 newline not followed by two more newlin +es )+ ) /x

      print "Just another Perl ${\(trickster and hacker)},"
      The Sidhekin proves Sidhe did it!