Hello, I'm making a mechanized Perl script that collects images for a school project, by downloading them. These images are generated by PHP. My problem is that when I downloaded an image, Linux (Ubuntu distr.) doesn't recognizes it as an image (despite of all the file extensions I've tried so far), but as a plain text file. I've tried to download other PHP images, and it worked fine with downloading these and viewing them. Therefore I think the reason of my issue is the rather strange URL of the image. The URL consists of an HTML GET method, but with not subdirectory. However, when I downloaded these images manually with Firefox, it worked. Another odd thing is that, when I view the image - downloaded by Firefox - the image doesn't has a file extension, however, when i view the properties of this file, it see the type of this file is JPEG. The image URL is as follows:
http://www.site.com/images/?id=345435
The Perl modules that I use:
WWW::Mechanize; # to browse through the site LWP::UserAgent; # for downloading the images
The actual Perl code I use for downloading the images:
# load modules WWW::Mechanize; LWP::UserAgent; # create new sessions $mechanize = WWW::Mechanize->new(autocheck => 1); # "autocheck => +1" will show possible errors $useragent = LWP::UserAgent->new; # define useragent $agent = "Mozilla/5.0"; $mechanize->agent($useragent); $useragent->agent($agent); # define the url $mechanize->get("http://www.site.com/images/"); # fetch the content $content = $mechanize->content(); # get the image url by parsing the content with regular expression +s $content =~ /<tr><td><img src="(.+)" alt="php-image" \/><\/td><\/t +r>/; # the url will be extracted to $1 # download the image $time = time(); $useragent->mirror($1, "/home/cafaro/images/$time.jpg"); # provide output print "The image ($time.jpg) has been saved.\n";
I hope I gave enough information. Cheers, cafaro

In reply to Downloading PHP-generated images using the LWP::UserAgent and WWW::Mechanize modules by cafaro

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.