I agree about Netsaint, there is hardly any point in reinventing the wheel if you don't have reasons (such as I want to :) ).
Anyway, if you do decide to roll your own, here are a few pointers you might find useful - been there, done that.
- Be very careful when defining what is "up" and what is "down". The script should not report that everything is down just because the server is a bit strained. Give it a second chance.
- Be very, very careful when defining what is "up" and what is "down" if this desicion will make your script take any automatic action (maybe restart the application or something - on one hand, a restart might freshen everything up, on the other hand, in some designs it could cause customers to lose their session - and their carts with it!).
- Consider making a special "stats" page for your script to access, so that the server can fetch some data as well. Perhaps the load on the server, how many sessions deemed active... XML is nice for this. Make it protected, though.
- Consider not using a special page for the script. At least make sure you just don't test "index.html". Make the script test several pages, and if possible, simulate a flow over several pages (yes, this takes some coding).
- Email is your friend. Mail yourself warnings when certain criteria is filled. Maybe mail yourself if things start to look good again too, so you will know that too.
- Email is your enemy. This pertains to the first points, about defining when "down" really is. If you get several mails a day, and especially if you are getting false alarms, you will very soon start to ignore the mails. Do not send mail unnecessarily.
- Log as much as possible. Anything you can think of might help later.
- Log as little as possible. You don't want to sift through an apache-access sized log file to get the facts you need. Make sure you can easily find the facts you need in the logs, via timestamps and such. Also use a special UserAgent header for your surfing script (for the normal weblog).
- Let the script surf from someplace else, outside your firewall, preferably from some totally different location. Overseas would be great. :) Otherwise, something besides your site might be down, and you wouldn't know.
Of course, there are tons, and tons of more things, but these points I could think of right away, and I know several of them would have helped me, had someone told me. :) As you can see, most of the points contradict one or several of the other points. This is intentional - both sides are correct to some extent, and the idea is to find the balance. For instance, a surfing type of script that times out after 10 seconds, reporting that the site is dead, is most likely a very bad idea - but so is a script that takes 10 minutes. Maybe a one minute timeout, with a doublecheck would be appropriate? Only you can answer that.
I hope these ideas gave you some hints on how to go about it. :)
| [reply] |
| [reply] |
You can combine usage of Netsaint and power of Perl and LWP or HTTP::WebTest. As grep have said you can write Netsaint plugins in Perl.
It is quite good approach. Netsaint gives you powerful monitoring framework which provides messaging, web interface, ready to use plugins for many services, etc and LWP or <shameless plug>even better HTTP::WebTest</shameless plug> gives you ability to write very complex tests which can cover all functionality of your web applications.
BTW one user of HTTP::WebTest have sent me a script - plugin for Netsaint which uses HTTP::WebTest to test websites. I don't feel it is generic enough to make it public but if you want I can email you it. Drop me email to ilya@martynov.org if you need it.
--
Ilya Martynov
(http://martynov.org/)
| [reply] |
Thanks for pointing me towards NetSaint. It may be just what I need to get going quickly. | [reply] |
Thanks for the tip. I have been searching for something off and on for a year and hadn't come across this. I will definitely look into it. | [reply] |