in reply to Connection Timeout duing form submissions

Hi, did you check the content of dates.txt? Maybe there's a problem with the line - e.g. like a whitespace somewhere? The following alternative approach (cut and paste and modify from lwpcook) worked fine for me...
Update: It directly accesses the service and does not require to fetch and parse the htm file each time. That should be slightly faster and reduces the network traffic (didn't bench-marked it).

use LWP::UserAgent; $ua = LWP::UserAgent->new; my $req = HTTP::Request->new(POST => 'http://bub2.met.psu.edu/cgi-wi +n/WXDaily.EXE'); $req->content_type('application/x-www-form-urlencoded'); $req->content('dtg=18961110'); my $res = $ua->request($req); print $res->as_string;

However, I am not sure if leeching approx. 41600 pages is a good idea. Maybe your IP or your user-agent is already on the black-list of their web-admin? My advice would be to contact the person responsible for this service and kindly ask for the raw data. Universities usually share such information for research purpose. Don't know what they do if you plan to use this information in a commercial context, though.

Replies are listed 'Best First'.
Re^2: Connection Timeout duing form submissions
by cheech (Beadle) on Jun 20, 2009 at 21:14 UTC
    dates.txt looks fine. No whitespace or incorrect numbers around 18961110.

    And as far as leeching the site for the files, this is a university site for the college I attend and have been instructed to gather this info by my advising instructor. The faculty is aware that such projects are taking place. The real question is why does the program keep failing at 1896110?

      Ok. I will suggest this again, run your program for some dates like August 1, 1921 to December 23, 1922.

      I think also that you should be "polite" regarding number of hits per second on the other website. The previous poster suggested this and I agree.

      Get your script working on a limited date range. Then expand that date range. Get your data and then "shut up". I would put some "sleep()" into the script and just let it run for a day. The data from 1920 isn't going to change. For your school project the objective shouldn't be: how to get this data as fast as possible, it should just be: how do I get this data?

      I also haven't yet seen any "this is what was sent" (the actual stuff) vs "this is what I received". I haven't seen any boundary test cases based upon what you have heard so far.