in reply to Pinging network devices and setting SNMP traps
A few random thoughts:
- Think about the situation where the network fails on the monitoring machine, so all hosts appear to be off the network, when really just that host is off the network.
- Consider also what happens if the network connection is down between the monitoring host and the alarm host.
- Think about what to do when something goes down that causes a lot of other things to go down. For example, if your fiber link gets cut, you don't want dozens of messages every 5 seconds saying that each of the hosts is down, unless you have something in place to prevent that from being extremely annoying.
- Definitely don't sound any kind of alarm for just one lost packet. Packet loss rates of 1% are fairly normal, and higher loss rates are normal over the open internet or over wireless networks.
- Think about what will happen if your script crashes. One possibility is to start it up from init, which will restart it if it dies. daemontools is another useful tool to keep your daemons running.