cheech has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm submitting each date from 1896 to the present to a form and collecting data on the returned pages. My loop inputs the date value, submits the form, scrapes the page, writes the data to file, goes back one page and sleeps for .1 seconds.

The problem is, I always get disconnected from the server eventually. Sometimes it's after 6 months, and sometimes 4 years. I think this may be a problem with the internet configuration here in my school apartment (I made it through 50 years at home last week), though I don't know how to check/remedy that.

How can I modify the block below to restart the loop at the current iteration whenever disconnection occurs?

Thanks!

use WWW::Mechanize; use Fortran::Format; my $mech = WWW::Mechanize->new(); $mech->get($url); foreach my $date (@dates) { $mech->form_number(1); $mech->field('dtg',"$date"); $mech->click(); my $page = $mech->content(format=>'text'); my @data = ($page =~ /:\s\s\s\s(\d\d)/g); my @rain = ($page =~ /Rain or Liquid Equivalent\s+:\s+(\S*)/); my @snow = ($page =~ /Snow and\/or Ice Pellets\s+:\s+(\S*)/); my @depth = ($page =~ /Snow Depth\s+:\s+(\S*)/); my @hdd = ($page =~ /Degree-Days\s+:\s+(\S*)/); my $f = Fortran::Format->new("(I2.1,2X)")->write($hdd[0]); @hlahdd = ("$date $data[0] $data[1] $data[2] $f"); print "@hlahdd"; print FH "@hlahdd"; sleep .1; $mech->back(); }

Replies are listed 'Best First'.
Re: WWW::Mechanize agent timing-out from server
by SuicideJunkie (Vicar) on Jul 28, 2009 at 20:48 UTC

    What if you change the for to a while (@dates) and then simply shift the dates out of your array when each one is successfully processed?

    That way, @dates will always contain the remaining work, and if there is a failure, you simply do not shift the array. (Or push the value back on after you shift it, so if there are impossible dates, you won't get stuck on it until after all good dates are done.)

      I see what you mean, but this still does not help me automatically restart the program if it gets timed-out. I need to make it all the way up to 2009...?

        You should not restart the program. Simply continue instead of dying.

        If you need to trap Mechanize dying, then consider the use of eval

      I apologize if I'm missing something, but I still don't see how this helps me. I believe what I'm after is exception handling.. My Mech agent submits a query to the server and gets disconnected, and I get a fatal error at the command line (server: timeout). I just want to be able to restart at the failed iteration of my loop without having to manually execute the script again..? I can't continue or exit the loop; My program dies upon being disconnected.
        sub myprogram { ... } # your program, which die on error while(1){ if( eval { myprogram(); 1 } ){ print "Program finished without dying, quitting\n"; last; } else { print "uh oh, myprogram() died : $@ \n retrying \n"; } }
Re: WWW::Mechanize agent timing-out from server
by Anonymous Monk on Jul 28, 2009 at 22:02 UTC