Re: [OT] Monitoring a website
by jhourcle (Prior) on Oct 02, 2007 at 14:19 UTC
|
You might want to look at the list of tests that are available for Big Brother, Big Sister, Nagios, MRTG or other network monitoring tools, and see what's on their list that might be useful, that you haven't thought of yet.
Personally, I look at two types of monitoring -- alerts for when something's having problems or going to have problems soon (eg, a disk is almost full, a webserver's taking too long to respond), and historical records, so I can try to spot trends and/or capacity planning. (eg, when I worked for a university -- is this measurement a true problem, or just part of a normal cycle (like usage spikes near end/beginning of semesters))
Although many people monitor to send alerts, I found the second to be more valuable -- You can trace back when memory/load/disk usage started going up, but before it hit alert levels, and find what changes were made shortly before that might be causing the problems. You can notice abnormal behaviour (the load goes up every Tuesday morning from 3am-9:30am? Maybe it's a cron job that needs to be moved forward so it completes before the workday starts), etc.
| [reply] |
Re: [OT] Monitoring a website
by Corion (Patriarch) on Oct 02, 2007 at 14:07 UTC
|
I use some simple LWP::Simple tests to verify that my hosted websites are still working whenever I restart Apache. For the mailrouting, I wrote me a (still unreleased) module that basically queries exim4 for the rule that applies to a target mail address:
for (<DATA>) {
my ($address,$expected_rule) = split / /;
my @output = `exim4 -bt $mailaddress`;
my @routes_as = grep /R:/, @output;
is $routes_as[0], $expected_rule, "$mailaddress routes as $expecte
+d_rule";
};
__DATA__
...
| [reply] [d/l] [select] |
Re: [OT] Monitoring a website
by blue_cowdawg (Monsignor) on Oct 02, 2007 at 17:45 UTC
|
Having set up lots of monitoring over the years using Nagios and its predecessor
Netsaint as well as HP OpenView and Sun Net Monitor I can tell you that figuring out what to monitor is
always an excersise that needs to be well thought out.
One thing I'd caution against is monitoring too much. Anything you run against a system is going to have
some form of penalty however slight that might be. If you have a lot of slight penalties you can cause a
death of a thousand scratches to what you are trying to monitor. Sort of an extreme example of
Heisenburgh Uncertainty where you are affecting what you are trying to measure.
How I normally select what to monitor is to first determine what is important to monitor. That whole
list you have, however impressive may be, may not be all items that are important to monitor. Start with
the basics.
- What applications am I running
- What will the user see if it goes down?
- Is the box up/down?
- Is it usable?
Then you build from there.
Having said all that... I'll just say this: K.I.S.S.
Peter L. Berghold -- Unix Professional
Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg
| [reply] |
|
|
Agreed. And it may be that some of these things can be measured initially so that we can figure out what normal is, and then can be reduced to once an hour, or once a day.
It reminds me of when I was working in paediatrics, and we had a premature baby who had been very sick. We'd treated him for a long time, and he had gradually recovered, but he had persistent anaemia, and we couldn't find a reason for it. Eventually, we figured out that it was because we had been monitoring him so closely - taking blood every day. We stopped checking, and he recovered nicely. thanks for the advice Clint
| [reply] |
Re: [OT] Monitoring a website
by CountZero (Bishop) on Oct 02, 2007 at 15:05 UTC
|
| [reply] |
Re: [OT] Monitoring a website
by talexb (Chancellor) on Oct 02, 2007 at 20:01 UTC
|
We use Nagios, and we monitor the following server parameters:
- inodes and space available on each drive
- swap usage and load nominal, ssh available
- some custom checks to make sure the web application is up and running (pid and http checks)
- a check that PostgreSQL is alive
That's just an application server -- the other things you list are numbers that might be interesting, but probably won't signal that the server is going down shortly. Rather, they are stats that I'd probably look at once a week, but I wouldn't want to set any alarms for them.
Alex / talexb / Toronto
"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds
| [reply] |
Re: [OT] Monitoring a website
by swngnmonk (Pilgrim) on Oct 02, 2007 at 19:34 UTC
|
As others have already mentioned, this is a realm that requires a lot of planning and thought. And there are plenty of off-the-shelf packages available out there.
With that, I'd throw my two cents in for mon:
http://www.kernel.org/software/mon/
It's written almost entirely in perl, it's extremely extensible, and it's fantastic for application-level monitoring, which I feel like a lot of the network-monitoring applications aren't well-suited to.
We use it to monitor and support an extremely complex server infrastructure that has a lot of dependencies and moving parts - mon has done a fantastic job for us.
| [reply] |
Re: [OT] Monitoring a website
by DutchCoder (Scribe) on Oct 02, 2007 at 20:18 UTC
|
Hi,
At work (50+ webservers) we use most of these tests every five minutes (errors activate a text message service) and we build graphs of 8 functions.
You might want to add "Total Load" and "CPU busy".
Don't for get to have an external hosting account checking if the server is reachable (every five minutes). (Error activates a text message service)
| [reply] |
Re: [OT] Monitoring a website
by misc (Friar) on Oct 03, 2007 at 09:11 UTC
|
I'm also checking the temperatures on my servers.
(harddisk, cpu, mainboard,..)
Assuming you run linux, you should be able to grep the temperatures in the files at /proc/acpi/thermal_zone/*
There should also be some modules at cpan.
I experienced temperature problems from time to time,
which leaded to strange problems (segfaults, the big bytecrunsher,...)
The last time I met the bytecrunsher, there was to much dust on the cpu's cooler.. | [reply] |
Re: [OT] Monitoring a website
by SFLEX (Chaplain) on Oct 03, 2007 at 10:10 UTC
|
requests
* requests / Size
For a web page check the size of the param, CGI.pm can do this for you and give an error "cgi_error()".
But if you can check a larg request and stop it before Perl touches it, i guess it would be a lot safer. | [reply] |