in reply to Downloading PHP-generated images using the LWP::UserAgent and WWW::Mechanize modules

This is untested code, but I use something similar at home when I scrape images:

#!/usr/bin/perl use warnings; use strict; # # warning: untested code. # package main; use WWW::Mechanize; my $mechanize = WWW::Mechanize->new(autocheck => 1); # define useragent $agent = "Mozilla/5.0"; $mechanize->agent($useragent); # get the url # $mechanize->get("http://www.site.com/images/"); # # get the images # my @images = $mechanize->images(); foreach my $img (@images) { my $time = time(); my $filename = $time . ".jpg"; my $mech2 = WWW::Mechanize->new(autocheck => 1); # # Save the images # Use the fact that WWW:Mechanize objects are # overloaded LWP::UserAgent objects. # # please see: # http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize.pm # and look for $mech->get # my $mech2->get($img->url(), ":content_file" => $filename ); }
  • Comment on Re: Downloading PHP-generated images using the LWP::UserAgent and WWW::Mechanize modules
  • Download Code

Replies are listed 'Best First'.
Re^2: Downloading PHP-generated images using the LWP::UserAgent and WWW::Mechanize modules
by Anonymous Monk on Aug 27, 2007 at 19:52 UTC
    Ok thanks, $1 showed the correct URL, when i printed it. What I forgot to mention, is that the file size of the - with Perl - downloaded file is 0kb. When i show the image (on my hard drive) in my browser it return the path of the file. And what do you exactly mean by "checksum (md5/sha1)"? Thanks, cafaro
      You need the full url for $ua->mirror(). If the urls in the HTML source are relative, your regex won't create a full url to mirror.

      You should probably use WWW::Mechanize's find_image() or find_all_images() method:

      for my $img ($www->find_all_images()) { $www->mirror($img->url_abs()); }
      update: you probably also don't need both an LWP::UserAgent and a WWW::Mechanize object, since WWW::Mechanize is a subclass of LWP::UserAgent. In fact, chances are, it will work better with just one WWW::Mechanize object.

        It dosen't work for this code.. #!C:/Perl/bin/perl.exe -w use warnings; use strict; use WWW::Mechanize; my $mech = WWW::Mechanize->new(agent => 'Mozilla/5.0'); $mech->mirror('http://www.click.in/includes/gd.php?scode=NzQwOTc0','d:/img.jpg'); Please replay me how to download this image.