awohld has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking at the following code, I can't figure out why: print "$1\n"; works. Where is $1 coming from?
use strict; use LWP 5.64; my $browser = LWP::UserAgent->new; my $url = 'http://www.cpan.org/RECENT.html'; my $response = $broswer->get($url); die "Can't get $url -- ", $response->status_line unless $response->is_success; my $html = $response->content; while( $html =~m/<A HREF=\"(.*?)\"/g) { print "$1\n"; }
How is $1 getting its value?

This some example code from the book "Spidering Hacks" and I'm not sure if it's a typo or something I'm missing.

Thanks
Adam

Update: Ahh I see, m//the expression is returning a list of matched values, with the first match it stores the (.*?) value at $1. Got it. Thanks.

Replies are listed 'Best First'.
Re: How is this printing?
by Sandy (Curate) on Oct 15, 2004 at 20:32 UTC
    When you use a regular expression to match something (i.e.) $html =~m/<A HREF=\"(.*?)\"/g) the parenthesis around ->.*?<- capture the result of the match, and stores it in a special variable ->$n where n is a number indicating the parenthesis group.

    Ok - let me try an example, it may be clearer

    $a="123abc345" $a=~/123(.*?)345/; # $1 is now equal to "abc" $a=~/(123).*?(345)/; # $1 is now "123" and $2 is "345" $a=~/(xyz).*/; # $1 is still "123" because there was # no match
        Nope, I'm not exactly sure if I understand what you are asking, but here's my best guess...

        Consider the following code and result

        foreach ("abc123", "xyz123","def12","ghi123") { /(.*)123/; print "scanning $_: found $1\n"; } __DATA__ scanning abc123: found abc scanning xyz123: found xyz scanning def12: found xyz scanning ghi123: found ghi <code> Notice that while scanning "<code>def12
        " the "$1" variable was not reset to undef but remains in its previously defined state.

        Was this what you were asking?

Re: How is this printing?
by Fletch (Bishop) on Oct 15, 2004 at 20:26 UTC

    Erm, from the m// right above it. Read perldoc perlop and perldoc perlretut.

    Update: And not entirely related but to paraphrase von Neuman, "Those attempting to parse HTML with regular expressions are living in a state of sin.". HTML::TreeBuilder or HTML::TokeParser or the like are much better. The regex will work in this case, but in general it's better to use a real parser.

Re: How is this printing?
by SamCG (Hermit) on Oct 15, 2004 at 21:13 UTC
    Keep in mind that failed matches don't reset the value of the match variables -- if you later do a regex match you'll want to be sure your $1 isn't from this current match. Example:
    $foo = "bungling"; $foo =~ /(ngl.)/; ## $1 is now "ngli" $matched = $1; ## $matched = "ngli" $foo =~ /bung(ngl.)/; ## No match! $matched2 = $1; ## $matched 2 is now "ngli"
    Also, a lot of programmers will apply regexes against the "default" variable
    $_="tricky"; ## usually not assigned explicitly, typically comes in d +uring a for loop /tr(.{3})y/; print $1; # $1 = "ick"
Re: How is this printing?
by ccn (Vicar) on Oct 15, 2004 at 20:27 UTC
Re: How is this printing?
by TedPride (Priest) on Oct 15, 2004 at 21:38 UTC
    Yes, but you don't have to worry about that with a while loop, since it ends as soon as it fails to find a match.