Web Site Monitoring

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Web Site Monitoring
by grep (Monsignor) on Feb 15, 2002 at 06:51 UTC

Netsaint

grep

grep> rm -f /bin/laden

[reply]

Web Site Monitoring, a few tips you might like.

by Dog and Pony (Priest) on Feb 15, 2002 at 09:34 UTC

I want to

Anyway, if you do decide to roll your own, here are a few pointers you might find useful - been there, done that.

Be very careful when defining what is "up" and what is "down". The script should not report that everything is down just because the server is a bit strained. Give it a second chance.
Be very, very careful when defining what is "up" and what is "down" if this desicion will make your script take any automatic action (maybe restart the application or something - on one hand, a restart might freshen everything up, on the other hand, in some designs it could cause customers to lose their session - and their carts with it!).
Consider making a special "stats" page for your script to access, so that the server can fetch some data as well. Perhaps the load on the server, how many sessions deemed active... XML is nice for this. Make it protected, though.
Consider not using a special page for the script. At least make sure you just don't test "index.html". Make the script test several pages, and if possible, simulate a flow over several pages (yes, this takes some coding).
Email is your friend. Mail yourself warnings when certain criteria is filled. Maybe mail yourself if things start to look good again too, so you will know that too.
Email is your enemy. This pertains to the first points, about defining when "down" really is. If you get several mails a day, and especially if you are getting false alarms, you will very soon start to ignore the mails. Do not send mail unnecessarily.
Log as much as possible. Anything you can think of might help later.
Log as little as possible. You don't want to sift through an apache-access sized log file to get the facts you need. Make sure you can easily find the facts you need in the logs, via timestamps and such. Also use a special UserAgent header for your surfing script (for the normal weblog).
Let the script surf from someplace else, outside your firewall, preferably from some totally different location. Overseas would be great. :) Otherwise, something besides your site might be down, and you wouldn't know.

Of course, there are tons, and tons of more things, but these points I could think of right away, and I know several of them would have helped me, had someone told me. :) As you can see, most of the points contradict one or several of the other points. This is intentional - both sides are correct to some extent, and the idea is to find the balance. For instance, a surfing type of script that times out after 10 seconds, reporting that the site is dead, is most likely a very bad idea - but so is a script that takes 10 minutes. Maybe a one minute timeout, with a doublecheck would be appropriate? Only you can answer that.

I hope these ideas gave you some hints on how to go about it. :)

[reply]

Re: Web Site Monitoring, a few tips you might like.

by rah (Monk) on Feb 16, 2002 at 17:20 UTC

Netsaint sounds good, and may get me up and running quickly
I would still like to write something using LWP or HTTP::WebTest
I appreciate all of the tips, even the contradictory ones. I recognize that is how the real world works and that your providing these tips is a dangerous thing to do, since you don't know my apps or environment.

To all who replied, thanks. If I end up building something, I'll post it so you can "rip it to shreds" :> -Rich

[reply]

Re: Re: Web Site Monitoring, a few tips you might like.

by IlyaM (Parson) on Feb 16, 2002 at 22:58 UTC

Re: Re: Web Site Monitoring

by rah (Monk) on Feb 16, 2002 at 17:09 UTC

Thanks for pointing me towards NetSaint. It may be just what I need to get going quickly.

[reply]

Re: Re: Web Site Monitoring

by rah (Monk) on Feb 16, 2002 at 16:54 UTC

Thanks for the tip. I have been searching for something off and on for a year and hadn't come across this. I will definitely look into it.

[reply]

Re: Web Site Monitoring
by IlyaM (Parson) on Feb 15, 2002 at 10:09 UTC

HTTP::WebTest

--
Ilya Martynov (http://martynov.org/)

[reply]

Re: Web Site Monitoring
by Ryszard (Priest) on Feb 15, 2002 at 06:52 UTC

use LWP;

Make sure tho' if your web server sits between your corporate world and the customer land, you are grabbing the URL from customer land, otherwise you may be able to get your pages, but customers cant because of some problem. (ACL problems are relatively common)

Sure your app is running, but no-one can get to it! Just another layer of checking that can be done with relative ease.

[reply]
[d/l]

Re: Web Site Monitoring
by ehdonhon (Curate) on Feb 15, 2002 at 15:14 UTC

Redundancy is your friend.

Especially if what you are monitoring is mission critical, you want to be able to monitor your site, but you also want to something to monitor your monitoring software. What happens if it turns out the machine watching your server also went out in the same power failure?

I would say you want something like this:

Server A is serving your pages.
Server B is separated far enough from Server A so that its unlikely that they would be affected by the same outages (or at least you would have another way of knowing if they were).
Server B monitors server A.
Server A monitors server B.

There are other points to consider. For example, how does your software run? Is it a daemon? Do you have something that will catch when the daemon dies? Is it running by cron? What will let you know if cron dies?

The other item is monitoring vs. management. Its far better to have a report that says "Hey your server went down and I restarted it for you, everything is ok now." then to have a report that says "Hey your server is down, your paying customer will be complaining soon, hurry up and restart it.".

[reply]