pirkil has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks, my script has to fetch some data from web, the mechanism looks like this:

for my $try (1 .. 5) { debug("fetching HTML source: try $try of 5\n") if $debug; my $mech = WWW::Mechanize->new( autocheck => 0, ssl_opts => { verify_hostname => 0, SSL_version => 'TLSv1', }, timeout => 60, ); $mech->proxy('https', $args_hr->{proxy}); # Try::Tiny try { $mech->get( $url ); } catch { $err .= $_ if $_; }; my $text = $mech->content; $err .= "Can't fetch HTML source from $url!\n" if !$mech->s +uccess(); ... sleep 30; # before next try - if download was not succ. }
I run this script on a server (with cron job). The URL is always the same. Sometimes I got en error, variable $text contains: read timeout at /usr/local/share/perl5/Net/HTTP/Methods.pm line 268. Other runs are OK, so I am not sure what the problem is and how to avoid it. Thanks for help! EDIT: Thank you for all replies! I haven't reported any other problems so far. The problem was probably not in the Perl script.

Replies are listed 'Best First'.
Re: Occasional Read Timeout with Mech
by Discipulus (Canon) on Dec 17, 2014 at 11:56 UTC
    you can parallely give a try to mine similar project using LWP::UserAgent and compare the results. You can set a different timeout within the latter and do other comparisons. You can also find an increasing response time before timeouts

    You can find it here
    webTimeLoad23.pl --count 1000 --verbosity 0 --protocol https --sleep 6 +0 --timeout 120 --url www.yoururl.org


    HtH
    L*
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      I was also thinking about the timeout & response time increase. I will try it.

        In the millions of typical web page requests I make, the vas majority are completed, from start to finish, in 4 seconds. How large is the file being requested that a 60 second timeout might be too low?

Re: Occasional Read Timeout with Mech
by Anonymous Monk on Dec 17, 2014 at 11:39 UTC

    One possibility is that it has nothing to do with Perl but instead once in a while the connection to the server never gets established for whatever reason. Have you tried doing the same thing your script is doing with a program like wget to see if you have the same problem there? Connections not getting established can actually be somewhat normal depending on where the server is in relation to the client (i.e. are you going over an Internet connection or is it on a local network?). You can also try to capture a failed connection with Wireshark to get a better idea if this problem is at the network level.

      Update: ... never gets established or gets interrupted for whatever reason. (Really, a lot of things can go wrong on a network connection depending on how far away the server is...)
      Actually, I have tried wget before WWW::Mechanize (I couldn't figure out correct settings for Mech initialization first) and it has worked. The connection is over internet. I am afraid taht the problem it is related with the host. I will probably increase time of sleep & timeout, as suggested by Discipulus. Thx for your answer.
Re: Occasional Read Timeout with Mech
by benwills (Sexton) on Dec 17, 2014 at 23:43 UTC

    I run a lot of http requests to pull web pages. Errors in connecting (for me) occur in two ways: 1, to get the dns, and 2, to the host. Looking at line 268 of Methods.pm, it *seems* that the connection has already been established and there is an error reading the contents of the response. I say seems, because that's a sub that could be called from anywhere, including a DNS lookup.

    Do you have any other information about what else has happened prior to the error? Does this happen all 5 times of your loop, and that's when there's a problem? Or does it just happen randomly and kill the script? And what percentage of the time does it happen that the script errors out like this?

      The script runs twice in a day. This problem occurred in two specific days (all runs were unsuccessful in these days). Day after and the script worked well again. Yes, connection was established, and the Read Timeout occurred in all iterations of the loop (on the same place). The script was not killed, I use Try::Tiny for that (and send reports via e-mail.) The target file (plain HTML source) is small.

      I know that there were similar problems. I didn't expect this to be tho cause, because the read timeout happens exceptionally.

        If the issue is what you linked to, which seems very similar to your situation, I'd try another module that supports SSL. If all you're doing is basically grabbing one HTML page, Mechanize is a bit over-featured for that, and you should have several other modules available to you as options. You could easily test and move to another module without many changes to your code.

        If it's not that issue you linked to, I'd need more information on what's happening to diagnose it any further. But a longer timeout wouldn't actually "fix" it, as the issue would still remain. And for a small html page, you should be pulling that in in a second or two. So if you find yourself "solving" it by increasing the timeout, the problem will still be there.

        If you're just making a simple get request and reading a web page, I'd go with IO::Socket::SSL. In general, I prefer to do things with as simple and basic of modules as possible. This way, when you do have bugs like you're having, there's far less code to sift through to find the source, and you also have less that can get in the way as bugs.

        Give that a shot and report back. If you're still having problems, report back with more details on exactly what you're doing in terms of the URL you're grabbing (or one very similar) and how the rest is set up. I'll then try and recreate it on my end to see if we can unwind the issue.

      "..that's a sub that could be called from anywhere, including a DNS lookup.." - the timeout in IO::Socket objects does not account for DNS lookups. The same is true for LWP::UserAgent and WWW::Mechanize, which just use the timeout from IO::Socket.