in reply to Net::SSL SSL read timeout

Do you have any other monitoring?

Typically, when I'm doing monitoring, I don't just look for yes/no answers, but I try to find things that might be a sign of trends -- in a case like this, I might look at the total time that it took to get the page ... that way, I can graph it in MRTG (or whatever you favorite graphing program is).

If I saw an increased response time around the timeout, I'd look to see what the load was on the machine, and assume that the monitoring wasn't at fault ... if it was an isolated failure, then I'd have to look at the monitoring from end to end.

(it's like the 'check engine' light on your car -- I'd much rather have a series of gauges, so I had some earlier warnings that something's trending upwards, rather than some light turn on once it his a threshold)

Anyway, I'd try to look at whatever other logs you might have, as there might be anomalies that would indicate what went wrong, or at least hint at where to look.

Replies are listed 'Best First'.
Re^2: Net::SSL SSL read timeout
by cbrandtbuffalo (Deacon) on Dec 20, 2005 at 19:04 UTC
    I thought of many of the same things when I first stated looking at this.

    The first logs I looked at were the apache logs, but this really isn't a 500 error, so it is nearly impossible to put the request together with anything on the apache side.

    Unfortunately, this monitoring script doesn't record response time. That's something we need to add because, as you rightly point out, this would be a good extra piece of information. However, in this case I assume it would be 60 seconds since that's the timeout.

    Correlating with other data is also difficult because we have a farm-type architecture. Other monitoring scripts may or may not be hitting the same servers as this one because we hit the front door.

    As far as other logs, the only thing I could think of were SSL logs.

    Thanks for the suggestions.

      I had the same "500 SSL read timeout" problem using LWP and Crypt::SSLeay. Adding a "timeout => XX" value sorted thus out. It was simply a step in a process which took slightly longer to return than the previous request which had all succeeded.