in reply to Occasional Read Timeout with Mech

I run a lot of http requests to pull web pages. Errors in connecting (for me) occur in two ways: 1, to get the dns, and 2, to the host. Looking at line 268 of Methods.pm, it *seems* that the connection has already been established and there is an error reading the contents of the response. I say seems, because that's a sub that could be called from anywhere, including a DNS lookup.

Do you have any other information about what else has happened prior to the error? Does this happen all 5 times of your loop, and that's when there's a problem? Or does it just happen randomly and kill the script? And what percentage of the time does it happen that the script errors out like this?

Replies are listed 'Best First'.
Re^2: Occasional Read Timeout with Mech
by pirkil (Beadle) on Dec 19, 2014 at 07:59 UTC

    The script runs twice in a day. This problem occurred in two specific days (all runs were unsuccessful in these days). Day after and the script worked well again. Yes, connection was established, and the Read Timeout occurred in all iterations of the loop (on the same place). The script was not killed, I use Try::Tiny for that (and send reports via e-mail.) The target file (plain HTML source) is small.

    I know that there were similar problems. I didn't expect this to be tho cause, because the read timeout happens exceptionally.

      If the issue is what you linked to, which seems very similar to your situation, I'd try another module that supports SSL. If all you're doing is basically grabbing one HTML page, Mechanize is a bit over-featured for that, and you should have several other modules available to you as options. You could easily test and move to another module without many changes to your code.

      If it's not that issue you linked to, I'd need more information on what's happening to diagnose it any further. But a longer timeout wouldn't actually "fix" it, as the issue would still remain. And for a small html page, you should be pulling that in in a second or two. So if you find yourself "solving" it by increasing the timeout, the problem will still be there.

      If you're just making a simple get request and reading a web page, I'd go with IO::Socket::SSL. In general, I prefer to do things with as simple and basic of modules as possible. This way, when you do have bugs like you're having, there's far less code to sift through to find the source, and you also have less that can get in the way as bugs.

      Give that a shot and report back. If you're still having problems, report back with more details on exactly what you're doing in terms of the URL you're grabbing (or one very similar) and how the rest is set up. I'll then try and recreate it on my end to see if we can unwind the issue.

        "...If you're just making a simple get request and reading a web page, I'd go with IO::Socket::SSL..." - I would advice against this. Getting a HTTP request correctly and especially parsing the response is more complex than it seems from looking at some examples, at least of you want to do it correctly. Especially chunked mode (length not known up-front), content-encoding (compression) and persistent connections (keep-alive) regularly cause problems. Also, LWP::UserAgent takes care of proxies, cookies etc.
        um, WWW::Mechanize uses IO::Socket::SSL underneath ... going lower level like IO::Socket::SSL isn't very convenient .... timeouts happen
Re^2: Occasional Read Timeout with Mech
by noxxi (Pilgrim) on Dec 20, 2014 at 23:15 UTC
    "..that's a sub that could be called from anywhere, including a DNS lookup.." - the timeout in IO::Socket objects does not account for DNS lookups. The same is true for LWP::UserAgent and WWW::Mechanize, which just use the timeout from IO::Socket.