So, if I understand this correctly, with the default setting, "something" has to happen between the client and the server every 3 minutes or the request will be aborted. That then implies with 90 minutes there must have been at least 30 "somethings" happened on the connection. These pages are not "big" max maybe 2K bytes and are straight HTML, no JS to bloat things. I saw this on using my home Windows machine as the client - no throttling going on at my end once a request is initiated. I do have some conscious throttling to reduce the rate of page requests from me - this thing is designed to "be nice" to the target website.
I don't really understand how many transmissions back and forth (or "over's" are needed to transfer a page that is a lot smaller than the Perl Monks page I am typing this into. I guess I am pretty much stunned - I was thinking like maybe it takes 10 "over's", that would add 30 minutes and wouldn't be a problem. Obviously my thinking was too primitive!
I am not sure about implementing my own timeout. The only way I know how to do that would be with SIGALRM. There is only one of those and if LWP is using it, then I am worried about conflicts. Suggestions welcome.
One approach that I am considering is implementing a lockfile. When the Windows Scheduler wants to "go", a bat file would check if lockfile exists and if so, then abort that run and let whatever is running just keep running. When the bat file sees the exit from my software, it removes lockfile. I think net effect would be that I occasionally miss an hourly update. That is acceptable to be as long as it doesn't "happen too often" with definition of that TBD.
From the log file that my startup bat file will make, I can look back and see how often and at what times of day/week this is happening. I suspect that it is not conscious throttling at the other end, but rather a glitch in the server's software that occasionally causes a barf. The sysop may not even be aware that this is happening. Normal run time for my software is just a minute or two - max is about 5 min. Right now I am making a humongous run because I am trying to recreate the problem. But in normal operation where the error report came from, this software is very "low key". And most hours, it doesn't do much of anything.
Update: I got some more data from my overnight stress run. Fetched ~154K pages over about 13 hours. This resulted in 4 retry sequences being initiated. The max elapsed time in the 4 retry sequences: 1 sec, 90 min, 30 sec, 30 sec. A typical second has 3-4 requests. But a typical hour only has about 15. Its looking like the lock file approach will work. I will think some more about how to bulletproof it so that this thing won't hang for a long time without him knowing about it. It is clear that the time to complete a request can be much longer than the 3 minute timeout value. | [reply] |
For Windows, the simplest solution is a parent monitor process that kills the child worker after a timeout. See Proc::Background. You can even write it in a generic way that adds timeouts to any script you might launch through it. By the way, SIGALRM doesn't actually exist on Windows; it'll be a perl emulation which might not behave the same way. I actually don't know if I've tried it on Windows before. Hopefully LWP::UserAgent is written with select() rather than SIGALARM, but I don't know that either.
You could also try an event library like Mojo::IOLoop or AnyEvent or IO::Async with a matching event-based user agent like Mojo::UserAgent, AnyEvent::UserAgent, or Net::Async::HTTP. These involve re-writing your script significantly, but then you have all the benefits of event-driven programming at your fingertips, and a timeout is super-easy.
| [reply] |