mech follow_link question

zingbust has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: mech follow_link question by choroba (Cardinal) on Feb 18, 2012 at 23:36 UTC
By "relevant", you probably mean "relative". How is `$url` set? After following a link, your mech loads the target page. What do you want to `get` then?	[reply] [d/l] [select]
Re: mech follow_link question by Anonymous Monk on Feb 19, 2012 at 05:46 UTC
it prepends the directory on my hard drive that I'm running the perl script from. Don't think so	[reply]
Re^2: mech follow_link question by zingbust (Initiate) on Feb 19, 2012 at 15:23 UTC
sorry, I didn't obey the rule here, I'll try again by inserting the proper formatting for these posts. `$m = WWW::Mechanize->new(); $m->get($url); # $url is some home page @links = $m->links(); for $link ( @links ) { &follow; } sub follow { if ($m->follow_link( url_regex => qr/contact/i)){ print "$link->url\n"; } }` [download] I tried the "follow" subroutine, thinking that if the if statement was false, it would return undef, but instead, the first link it tried to follow, which did NOT contain the string "contact" just crashed the program with the error message "Link not found at c:\websites\bla_bla\my_perl_script.pl". Why would mech assume the relative link was something off my hard drive instead of from the first fetched page????	[reply] [d/l]
Re^3: mech follow_link question by Corion (Patriarch) on Feb 19, 2012 at 15:37 UTC
There are two things at play: First `Link not found at c:\websites\bla_bla\my_perl_script.pl` is just the error message by Perl, which tells you the line number where the error was raised. You left off the line number, but it is likely the number of the line in the subroutine `follow()`. The second thing is, WWW::Mechanize behaves like a browser. If you issue `->follow_link` for one link, all other links you may have collected will likely be not valid anymore, as they are not on the other page. Consider dumping the `->content` for each page. Maybe you want to go `->back` after visiting every page in turn? As a last point, your style of using the `&follow;` syntax mixed with global variables is discomforting. I would rewrite that snippet as: `for my $link ( @links ) { follow( $link ); }; sub follow { my ($link) = @_; warn "Following contact link; if ($m->follow_link( url_regex => qr/contact/i)){ print $link->url."\n"; } };` [download] As for your thoughts about how WWW::Mechanize works, and what the subroutines return, please read WWW::Mechanize. Most things are fatal to make it easier for you to spot when your assumptions deviate from the reality of the website you're automating.	[reply] [d/l] [select]
Re^4: mech follow_link question by zingbust (Initiate) on Feb 29, 2012 at 17:51 UTC
Re^5: mech follow_link question by Corion (Patriarch) on Feb 29, 2012 at 19:11 UTC