hello dear all


have a nice script that works as a image-scraper: for the first trials and tests all goes well.
here a list or urls that i use in urls.txt - that i am running against with the script. Note this is only a short list. i need to run against 2500 Urls - so it would be great if the sript is a bit more robust and would continue to run - if some urls are not available or take too much time to get. i thint that the script is running into some problems if some Urls are not available or take too much time or do block mozrepl and www:Mechanize::FireFox too much time.

Well - do you think that my ideas and suggestions are probably the cause of the issue or not. If so - how can we improve the script and make it stronger and more powerful - and robust so that it does not stop tooo soon.

love to hear from you

greetiings
see the code the list of urls - note this is only a very very short list...

http://www.bez-zofingen.ch http://www.schulesins.ch http://www.schulen-turgi.ch/pages/bezirksschule/startseite.php http://www.schinznach-dorf.ch http://www.schule-seengen.ch http://www.gilgenberg.ch/schule/bez/2005-06/ http://www.rheinfelden-schulen.ch/bezirksschule/ http://www.bezmuri.ch http://www.moehlin.ch/schulen/ http://www.schule-mewo.ch http://www.bez-frick.ch http://www.bezendingen.ch http://www.bezbrugg.ch http://www.schule-bremgarten.ch/content/view/20/37/ http://www.bez-balsthal.ch http://www.schule-baden.ch http://bezaarau.educanet2.ch/info/.ws_gen/index.htm http://www.benedict-basel.ch http://www.institut-beatenberg.ch/ http://www.schulewilchingen.ch http://www.ksuo.ch http://www.international-school.ch http://www.vsgtaegerwilen.ch/ http://www.vgk.ch/ http://www.vstb.ch




well but i guess that i would be very happy if it is more robust than now


well sure thing it is driving a real browser as with WWW::Mechanize::Firefox

so somewhere it might be somewhat instable - perhaps some bit more than any other screen-scraping solution. I am getting sometimes some errors like the following... (see below) note i also had a closer look at the debugging pages http://search.cpan.org/~corion/WWW-Mechanize-Firefox-0.64/lib/WWW/Mechanize/Firefox/Troubleshooting.pod with its hints and tricks and workarounds regarding various bugs, troubles and things like that.

see the code:

#!/usr/bin/perl use strict; use warnings; use WWW::Mechanize::Firefox; my $mech = new WWW::Mechanize::Firefox(); open my $urls, '<', 'urls.txt' or die $!; while (<$urls>) { chomp; next unless /^http/i; print "$_\n"; $mech->get($_); my $png = $mech->content_as_png; my $name = $_; $name =~ s#^http://##i; $name =~ s#/##g; $name =~ s/\s+\z//; $name =~ s/\A\s+//; $name =~ s/^www\.//; $name .= ".png"; open(my $out, '>', "/home/martin/images/$name") or die $!; binmode $out; print $out $png; close $out; sleep 5; }
see the results and yes, also the errors where it stops.
martin@linux-wyee:~/perl> perl test_10.pl http://www.bez-zofingen.ch Datei oder Verzeichnis nicht gefunden at test_10.pl line 24, <$urls> l +ine 3. martin@linux-wyee:~/perl> perl test_10.pl http://www.bez-zofingen.ch http://www.schulesins.ch http://www.schulen-turgi.ch/pages/bezirksschule/startseite.php http://www.schinznach-dorf.ch http://www.schule-seengen.ch http://www.gilgenberg.ch/schule/bez/2005-06/ http://www.rheinfelden-schulen.ch/bezirksschule/ Not Found at test_10.pl line 15 martin@linux-wyee:~/perl>



what do you suggest - how can we make the script a bit more robust - how to get it so that it does not stop so early!?

greetings

In reply to WWW::Mechanize::Firefox runs well: some attempts to make the script a bit more robust by Perlbeginner1

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.