hexbase has asked for the wisdom of the Perl Monks concerning the following question:

I want to save an image(a captcha) with the www::mechanize module. I try something like:
my $mech = WWW::Mechanize->new(); $mech->get( $captchaurl, ':content_file' => 'captchar.jpg' );
Since the captcha comes from a php file, mech saves it as html, but not the image. However, the image can be saved manually on a browser. So, How do i save only the image, so it can be opened as a jpg later? Thanks

Replies are listed 'Best First'.
Re: How do i save an image with www::mechanize
by jonnyfolk (Vicar) on Jan 20, 2009 at 04:34 UTC
    Grab the html:
    my $mech = WWW::Mechanize->new(); $mech->get($add); my $r = $mech->content;
    perform a regex on $r to extract the url of the image, and then use either use Image::Magick or GD to save the image on your server.
    $r =~ m#regex(.*)# my $imgurl = $1; my ($image, $x); my $photofile ="photo.jpg; $image = Image::Magick->new; $x = $image->Read($imgurl); $x = $image->Write("$photofile");
      Or
      my $img = $mech->find_image( alt_regex => qr/captcha/i, url_regex => qr/Captcha/i, ); if( $img ){ $mech->get( $img->url, ':content_file' => 'captchar.jpg'); }
      # after you get the URL (like [jonnyfolk] shows above) use LWP::UserAgent; my $ua = LWP::UserAgent->new(); my $res = $ua->mirror($imageurl, $localfilename);

      If you want to do evil, science provides the most powerful weapons to do evil; but equally, if you want to do good, science puts into your hands the most powerful tools to do so.
      - Richard Dawkins
        Why wouldn't you just use the mechanize object?
      perform a regex on $r to extract the url of the image

      That's fragile and error-prone. Much better to use WWW::Mechanize's built-in methods, since they actually parse the returned HTML:

      $mech->get($add); my $img_obj = $mech->find_image( url_regex => qr{captcha\.php} ); $mech->get( $img_obj->url, ':content_file' => 'captchar.jpg' );

      ... or somesuch

      Update: sorry, somehow (!) didn't see Anonymonk's post


      Life is denied by lack of attention,
      whether it be to cleaning windows
      or trying to write a masterpiece...
      -- Nadia Boulanger
Re: How do i save an image with www::mechanize
by jeffa (Bishop) on Jan 20, 2009 at 14:40 UTC

    You know -- the point of someone requiring you to enter a captcha is to prevent you from hitting their site as a robot. Would you mind sharing the name of the site you are trying to run this code on? I would like to see what their Terms of Service agreement says about this.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
      a captcha is to prevent you from hitting their site as a robot

      Obviously it does not work ;)

      And you didn't even know bears could type.

Re: How do i save an image with www::mechanize
by missingthepoint (Friar) on Jan 21, 2009 at 01:37 UTC

    My guess is: this is because you're not sending a Referer(*) header, whereas the browser is. Try:

    $mech->add_header( Referer => '<url of page that embeds captcha>' ); $mech->get( ...

    (*): not a typo, see RFC 2616


    Life is denied by lack of attention,
    whether it be to cleaning windows
    or trying to write a masterpiece...
    -- Nadia Boulanger