jeanluca has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks

I'm trying to start with understanding the extended patterns. If I look at an example:
/(?<=\t)\w+/

This matches a word that follows a tab, without including the tab in $& (this is from perlre)
Two things, why not just use:
/\t\w+/
and the second thing, what is $& ?

Thanks in adnvace
Luca

Replies are listed 'Best First'.
Re: regexp: ?<=
by davorg (Chancellor) on Feb 16, 2006 at 13:38 UTC
    why not just use:
    /\t\w+/

    For cases where you're using it in a substitution operator (s///). The tab won't get included in the substitution.

    what is $& ?

    See perldoc perlvar for explanations of all of Perl's special variables. $& is the part of your string which matched the regex.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: regexp: ?<=
by davidrw (Prior) on Feb 16, 2006 at 13:43 UTC
    $& is (from perlvar) "The string matched by the last successful pattern match" .. for example s/\d+/foo$&bar/ will give turn 1234 into foo1234bar .. Note (see perlre somewhere) that this can be a significant performance hit ..

    As for using /\t\w+/ instead of the look-behind .. i think it makes it easier in some cases .. for (a trivial) example s/(?<=\t)\w+// instead of s/(\t)\w+/$1/ ..
Re: regexp: ?<=
by Roy Johnson (Monsignor) on Feb 16, 2006 at 14:49 UTC
    For some examples of why and how to use lookahead and lookbehind, see the tutorial Using Look-ahead and Look-behind. Full disclosure: I wrote it.

    Caution: Contents may have been coded under pressure.
Re: regexp: ?<=
by ikegami (Patriarch) on Feb 16, 2006 at 15:22 UTC

    Some common usages of (?<=...), (?=...) and (?!...):

    s/(?<=a)b/c/; # Changes the 'a', but keeps the 'b'. s/b(?=a)/c/; # Changes the 'a', but keeps the 'b'. /^(?=.*a)(?=.*b)/; # Is equivalent to /a/ && /b/. /a(?:(?!bcd).)*e/; # (?:(?!...).)* is to regexp what # [^...] is to characters. # This example reads as: # "Match 'a' followed by somethng which # doesn't contain 'bcd', followed by 'e'."

    $& refers to "everything that was matched". It's what gets substituted in a substitution. Avoid actually using $&. It has side effects which can slow down other regexps in your program. It can easily be emulated using captures.

Re: regexp: ?<=
by MCS (Monk) on Feb 16, 2006 at 14:44 UTC

    As others have mentioned, /(?<=\t)\w+/ does not include the tab in $& which contains the text that the regular expression matched. Perhaps some code might make it a little clearer:

    #!/usr/bin/perl use strict; my $string = "Testing \tthis regex"; print $string, "\n"; $string =~ /(?<=\t)\w+/; my $string1 = $&; $string =~ /\t\w+/; my $string2 = $&; print "/(?<=\\t)\w+/:\n" . $string1 . "\n"; print "/\\t\w+/:\n" . $string2 . "\n";

    This is mostly useful for substitutions but hopefully my code makes it a little clearer as to what is going on. The output is as follows:

    Testing this regex /(?<=\t)w+/: this /\tw+/: this