How is this printing?

awohld has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking at the following code, I can't figure out why: print "$1\n"; works. Where is $1 coming from?

use strict;
use LWP 5.64;

my $browser = LWP::UserAgent->new;
my $url = 'http://www.cpan.org/RECENT.html';
my $response = $broswer->get($url);

die "Can't get $url -- ", $response->status_line
   unless $response->is_success;

my $html = $response->content;
while( $html =~m/<A HREF=\"(.*?)\"/g) {
   print "$1\n";
}
[download]

How is $1 getting its value?

This some example code from the book "Spidering Hacks" and I'm not sure if it's a typo or something I'm missing.

Thanks
Adam

Update: Ahh I see, m//the expression is returning a list of matched values, with the first match it stores the (.*?) value at $1. Got it. Thanks.

Comment on How is this printing? Download Code

Replies are listed 'Best First'.
Re: How is this printing? by Sandy (Curate) on Oct 15, 2004 at 20:32 UTC
When you use a regular expression to match something (i.e.) `$html =~m/<A HREF=\"(.?)\"/g)` the parenthesis around ->`.?`<- capture the result of the match, and stores it in a special variable ->`$n` where `n` is a number indicating the parenthesis group. Ok - let me try an example, it may be clearer `$a="123abc345" $a=~/123(.?)345/; # $1 is now equal to "abc" $a=~/(123).?(345)/; # $1 is now "123" and $2 is "345" $a=~/(xyz).*/; # $1 is still "123" because there was # no match` [download]	[reply] [d/l] [select]
Re^2: How is this printing? by muba (Priest) on Oct 16, 2004 at 14:47 UTC
IIRC all $n variables get reset whenever a new regexp is uh... "done" (hope you understand me). Correct me if I'm wrong. `"2b"\|\|!"2b";$$_="the question"` Besides that, my code is untested unless stated otherwise. One more: please review the article about regular expressions (do's and don'ts) I'm working on.	[reply] [d/l]
Re^3: How is this printing? by Sandy (Curate) on Oct 19, 2004 at 17:04 UTC
Nope, I'm not exactly sure if I understand what you are asking, but here's my best guess... Consider the following code and result `foreach ("abc123", "xyz123","def12","ghi123") { /(.)123/; print "scanning $_: found $1\n"; } __DATA__ scanning abc123: found abc scanning xyz123: found xyz scanning def12: found xyz scanning ghi123: found ghi <code> Notice that while scanning "<code>def12` [download] " the "`$1`" variable was not reset* to `undef` but remains in its previously defined state. Was this what you were asking?	[reply] [d/l] [select]
Re^4: How is this printing? by muba (Priest) on Oct 19, 2004 at 20:18 UTC
Re: How is this printing? by Fletch (Bishop) on Oct 15, 2004 at 20:26 UTC
Erm, from the `m//` right above it. Read `perldoc perlop` and `perldoc perlretut`. Update: And not entirely related but to paraphrase von Neuman, "Those attempting to parse HTML with regular expressions are living in a state of sin.". HTML::TreeBuilder or HTML::TokeParser or the like are much better. The regex will work in this case, but in general it's better to use a real parser.	[reply] [d/l] [select]
Re: How is this printing? by SamCG (Hermit) on Oct 15, 2004 at 21:13 UTC
Keep in mind that failed matches don't reset the value of the match variables -- if you later do a regex match you'll want to be sure your $1 isn't from this current match. Example: `$foo = "bungling"; $foo =~ /(ngl.)/; ## $1 is now "ngli" $matched = $1; ## $matched = "ngli" $foo =~ /bung(ngl.)/; ## No match! $matched2 = $1; ## $matched 2 is now "ngli"` [download] Also, a lot of programmers will apply regexes against the "default" variable `$_="tricky"; ## usually not assigned explicitly, typically comes in d +uring a for loop /tr(.{3})y/; print $1; # $1 = "ick"` [download]	[reply] [d/l] [select]
Re: How is this printing? by ccn (Vicar) on Oct 15, 2004 at 20:27 UTC
see perldoc perlvar, perldoc perlretut, perldoc perlre	[reply]
Re: How is this printing? by TedPride (Priest) on Oct 15, 2004 at 21:38 UTC
Yes, but you don't have to worry about that with a while loop, since it ends as soon as it fails to find a match.	[reply]