Ok here is my code,
while( my $url = shift @urls)
{
print "URL is $url\n";
my $request = HTTP::Request->new(GET => $url);
my $parser = HTML::Parser->new(api_version => 3);
$parser->handler(start => \&start,'self,tagname,attr');
my $response = $browser->request($request);
if ($response->is_success)
{
print $response->content();
$parser->{base} ||= $response->base;
$parser->{browser} ||= $browser;
$parser->parse($response->content);
$parser->eof();
}
else
{
print "ERROR: " . $response->status_line . "\n";
}
} sub start
+
{
my ($parser,$tagname,$attr)= @_;
if ($tagname eq 'img')
{
if ($attr->{src})
+
{
+
my $img_url = $attr->{src};
+
my $remote_name =URI->new_abs($img_url,$parser
+->{base});
#my ($local_name) = $img_url =~ m!([^/]+)$!;
+
my $local_name = $remote_name->host . $remote_
+name->path
;
+
#my $local_name = "/dev/null";
+
mkpath(dirname($local_name),0,0711);
+
print "Getting imagefile: $img_url\n";
+
my $response = $parser->{browser}->mirror($rem
+ote_name,$
local_name);
+
print STDERR "YYY-$local_name: ",$response->me
+ssage,"\n"
;
+
}
+
}
+
}
Here is the output when I run it the second time
Getting imagefile: images/logo.gif
LWP::UserAgent::mirror: ()
LWP::UserAgent::request: ()
HTTP::Cookies::add_cookie_header: Checking www.google.com for cookies
HTTP::Cookies::add_cookie_header: Checking .google.com for cookies
HTTP::Cookies::add_cookie_header: - checking cookie path=/
HTTP::Cookies::add_cookie_header: - checking cookie PREF=ID=0f9d8bbb3b0ee898:TM
=1036535059:LM=1036535059:S=2ea2eKPQlO4uYAN6
HTTP::Cookies::add_cookie_header: it's a match
HTTP::Cookies::add_cookie_header: Checking google.com for cookies
HTTP::Cookies::add_cookie_header: Checking .com for cookies
LWP::UserAgent::send_request: GET http://www.google.com/images/logo.gif
LWP::UserAgent::_need_proxy: Not proxied
LWP::Protocol::http::request: ()
LWP::UserAgent::request: Simple response: Not Modified
YYY-www.google.com/images/logo.gif: 304 Not Modified
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.