Greetings,

Adding the infinitely advisable:

use strict; use warnings;
To your sample yields:
Variable "@imgs" will not stay shared at par.pl line 17 (#1)
Drilling further down:
C:\TEMP>perl -Mdiagnostics par.pl perl -Mdiagnostics par.pl Variable "@imgs" will not stay shared at par.pl line 17 (#1) (W closure) An inner (nested) named subroutine is referencing a lexical variable defined in an outer subroutine. When the inner subroutine is called, it will probably see the valu +e of the outer subroutine's variable as it was before and during the *f +irst* call to the outer subroutine; in this case, after the first call t +o the outer subroutine is complete, the inner and outer subroutines will + no longer share a common value for the variable. In other words, the variable will no longer be shared. Furthermore, if the outer subroutine is anonymous and references a lexical variable outside itself, then the outer and inner subrouti +nes will never share the given variable. This problem can usually be solved by making the inner subroutine anonymous, using the sub {} syntax. When inner anonymous subs tha +t reference variables in outer subroutines are called or referenced, + they are automatically rebound to the current values of such variables.
I could not have said it better myself. :)
Hence the working version:
use strict; use warnings; use HTML::LinkExtor; use LWP::UserAgent; use URI::URL; sub parsedocument { my ($url) = @_; my $ua = LWP::UserAgent->new; $ua->env_proxy(); # Set up a callback that collect image links my @imgs = (); my $callback = sub { my($tag, %attr) = @_; return if $tag ne 'img'; # we only look closer at <img ...> push(@imgs, values %attr); }; my $p = HTML::LinkExtor->new($callback); # Request document and parse it as it arrives my $res = $ua->request(HTTP::Request->new(GET => $url), sub {$p->parse($_[0])}); # Expand all image URLs to absolute ones my $base = $res->base; @imgs = map { $_ = url($_, $base)->abs; } @imgs; # Print them out print join("<br>", @imgs), "<br>"; } map {parsedocument($_) } @ARGV;
Note that your sample's original (from the documentation of HTML::Linkextor) works exactly because it occurs in the program's main. When you wrap it in a sub, you get the problem you described, which is wellknown - for instance - to people trying to use mod_perl and Apache::Registry.
Cheers,
alf
You can't have everything: where would you put it?

In reply to Re: problems using HTML::LinkExtor? anyone have an idea? by alien_life_form
in thread problems using HTML::LinkExtor? anyone have an idea? by costas

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.