Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: problems using HTML::LinkExtor? anyone have an idea?

by alien_life_form (Pilgrim)
on May 20, 2002 at 14:27 UTC ( [id://167840]=note: print w/replies, xml ) Need Help??


in reply to problems using HTML::LinkExtor? anyone have an idea?

Greetings,

Adding the infinitely advisable:

use strict; use warnings;
To your sample yields:
Variable "@imgs" will not stay shared at par.pl line 17 (#1)
Drilling further down:
C:\TEMP>perl -Mdiagnostics par.pl perl -Mdiagnostics par.pl Variable "@imgs" will not stay shared at par.pl line 17 (#1) (W closure) An inner (nested) named subroutine is referencing a lexical variable defined in an outer subroutine. When the inner subroutine is called, it will probably see the valu +e of the outer subroutine's variable as it was before and during the *f +irst* call to the outer subroutine; in this case, after the first call t +o the outer subroutine is complete, the inner and outer subroutines will + no longer share a common value for the variable. In other words, the variable will no longer be shared. Furthermore, if the outer subroutine is anonymous and references a lexical variable outside itself, then the outer and inner subrouti +nes will never share the given variable. This problem can usually be solved by making the inner subroutine anonymous, using the sub {} syntax. When inner anonymous subs tha +t reference variables in outer subroutines are called or referenced, + they are automatically rebound to the current values of such variables.
I could not have said it better myself. :)
Hence the working version:
use strict; use warnings; use HTML::LinkExtor; use LWP::UserAgent; use URI::URL; sub parsedocument { my ($url) = @_; my $ua = LWP::UserAgent->new; $ua->env_proxy(); # Set up a callback that collect image links my @imgs = (); my $callback = sub { my($tag, %attr) = @_; return if $tag ne 'img'; # we only look closer at <img ...> push(@imgs, values %attr); }; my $p = HTML::LinkExtor->new($callback); # Request document and parse it as it arrives my $res = $ua->request(HTTP::Request->new(GET => $url), sub {$p->parse($_[0])}); # Expand all image URLs to absolute ones my $base = $res->base; @imgs = map { $_ = url($_, $base)->abs; } @imgs; # Print them out print join("<br>", @imgs), "<br>"; } map {parsedocument($_) } @ARGV;
Note that your sample's original (from the documentation of HTML::Linkextor) works exactly because it occurs in the program's main. When you wrap it in a sub, you get the problem you described, which is wellknown - for instance - to people trying to use mod_perl and Apache::Registry.
Cheers,
alf
You can't have everything: where would you put it?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://167840]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-04-25 17:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found