in reply to intelli-monitor.pl

perl -e'sleep' httpd And your code will not detect that httpd is gone.

Besides that, your code only notices when a process is gone. Far more often, in my experience, processes are still there, but don't work properly. So instead, test the functionality. For example, I use this hack to restart my apache when needed:

#!/usr/bin/perl use strict; use LWP::Simple; exit 0 if -e "/etc/nouptest"; eval { local $SIG{ALRM} = sub { die "Alarm\n" }; alarm 10; my $p = get 'http://uptest.convolution.nl/'; $p =~ /xyzzy/ or die "Down\n"; alarm 0; }; if ($@) { if ($@ =~ /Alarm|Down/) { system qw[/etc/init.d/apache stop]; sleep 3; system qw[killall -9 apache]; sleep 3; system qw[/etc/init.d/apache start]; } }
Cron runs this every minute and it arranges for me to get mail when Apache was restarted (because the init scripts have output). As a nice side effect, this way I get lots of mail when the nameserver is broken :)

Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Replies are listed 'Best First'.
Re: Re: intelli-monitor.pl
by biosysadmin (Deacon) on May 07, 2004 at 03:50 UTC
    Thanks for the tips. I'd definitely like to test on a more accurate basis, but the problem was specific enough for this to work. At least, it has worked so far. :)

    A good idea might be to anchor the regex to match at the beginning and end of the line, this would lessen the non-specific matching problem that you mention.

    I was actually thinking of writing Nagios plugins to test all of these services, but that's a task for another day, while this was simply a half hour of scripting.