MintyFresh has asked for the wisdom of the Perl Monks concerning the following question:

I have a somewhat complex (for me, anyway) routine here that SHOULD go out and grab a random picture from a remote site. Here is the code.
%seen = (); @uniq = grep { ! $seen{$_} ++ } @z; my $query = q|SELECT weburl,datecode,idnum FROM maindb WHERE i +dnum IN (| . join(',', map { $dbh->quote($_) } @uniq) . q|)|; my($sth) = $dbh->prepare($query); $sth->execute || die("Could not execute!"); my $array_ref = $sth->fetchall_arrayref(); foreach my $row (@$array_ref) { my ( $url, $dc, $idn ) = @$row; if ($url){ $ua = new LWP::UserAgent; $ua->agent ('Mozilla/4.0 (compatible; MSIE 5.03; Windows 95)'); my @imgs = (); sub callback { my($tag, %attr) = @_; return if $tag ne 'img'; push(@imgs, values %attr); } $p = HTML::LinkExtor->new(\&callback); $res = $ua->request(HTTP::Request->new(GET => $url), sub {$p->parse( +$_[0])}); my $base = $res->base; @imgs = map { $_ = url($_, $base)->abs; } @imgs; @foo = grep /\.jpg/i, @imgs; $randompic = $foo[rand @foo]; $req1 = new HTTP::Request 'GET' => $randompic; my $img_response = $ua->request($req1); $content = $img_response->content; open(TGPMAIN,">$cat_html/featured/$idn\.jpg") || print "Can't +open gall REASON: ($!)\n"; print TGPMAIN "$content"; close(TGPMAIN); $sql = "INSERT INTO photofour VALUES('$idn','$dc','$randompic')"; $dbh->do($sql); } }
For the first picture, it works great. Everything works as expected at first, but if @z contains more than one value only the first is actually processed correctly. For instance, @z might have 123 and 456 in it. The script will select the URL from the DB with 123 as its ID. The script will then spider that URL and retrieve a list of .jpg images available there. It will randomly choose one and download it to a local directroy naming it 123.jpg. After that is over with, the "photos" table will have inserted into it the ID number, a date code, and original URL of the pic. The trouble would come when trying to spider the next URL (the one with ID 456 in this case). The script creates a blank image with the proper name and into the DB is entered the ID number and date code, the original URL is not present. Can anyone spot why this will only work the first time through? I have been on this for several hours and it's starting to confuse me more and more as time goes by.

Replies are listed 'Best First'.
Re: Photo Grabbing Routine
by dws (Chancellor) on Jun 18, 2002 at 00:40 UTC
    The trouble would come when trying to spider the next URL (the one with ID 456 in this case). The script creates a blank image with the proper name and into the DB is entered the ID number and date code, the original URL is not present.

    As general advice, check errors. You're passing up the opportunity to detect several, including any error that may come back from $ua->request(), as well as any error that might be reported by $dbh->do().

    Try going for ID 456 first. This will show whether there's a problem fetching with that particular URL. Perhaps that the page that URL refers to has no images? From the looks of the code, that would certainly cause some problems.

Re: Photo Grabbing Routine
by gumby (Scribe) on Jun 18, 2002 at 15:18 UTC
    An alternative might be to pick a random link returned from google image search.

    Update: To clarify:

    ... $req1 = POST 'http://images.google.com/images', [ q => $somethingrando +m, ie => 'UTF8', oe => 'UTF8', hl => 'en' ]; ...