in reply to [OT] Monitoring a website

Having set up lots of monitoring over the years using Nagios and its predecessor Netsaint as well as HP OpenView and Sun Net Monitor I can tell you that figuring out what to monitor is always an excersise that needs to be well thought out.

One thing I'd caution against is monitoring too much. Anything you run against a system is going to have some form of penalty however slight that might be. If you have a lot of slight penalties you can cause a death of a thousand scratches to what you are trying to monitor. Sort of an extreme example of Heisenburgh Uncertainty where you are affecting what you are trying to measure.

How I normally select what to monitor is to first determine what is important to monitor. That whole list you have, however impressive may be, may not be all items that are important to monitor. Start with the basics.

Then you build from there.

Having said all that... I'll just say this: K.I.S.S.


Peter L. Berghold -- Unix Professional
Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg

Replies are listed 'Best First'.
Re^2: [OT] Monitoring a website
by clinton (Priest) on Oct 02, 2007 at 17:50 UTC
    Agreed. And it may be that some of these things can be measured initially so that we can figure out what normal is, and then can be reduced to once an hour, or once a day.

    It reminds me of when I was working in paediatrics, and we had a premature baby who had been very sick. We'd treated him for a long time, and he had gradually recovered, but he had persistent anaemia, and we couldn't find a reason for it. Eventually, we figured out that it was because we had been monitoring him so closely - taking blood every day. We stopped checking, and he recovered nicely.

    thanks for the advice

    Clint