Hey, talexb, you are prescient. The administrators of our network do run nagios, but they don't expose much of the interface to the users. I have no admin privileges. My collegaues expressed that a program that helped them pick which of our common-use machines to use at any given time would be very helpful. I had no idea if or what the admin people had that might be similar. So I built a system. At first, it was a shell command that automatically logged the user on to the machine with the lowest load and opened an instance of emacs. Since the information that program used was stored in a file (collected every five minutes by polling the machines in question), I created a web page that displayed the stats.
Several users got pretty psyched about the web page. I guess they run a lot of heavy duty Monte Carlo simulations.
Since I got positive feedback I've been working on an upgrade that uses a DB back end and stores minutely status data for the 18 machines. Eventually I hope to be able to create historical graphs of the data. The admin folks do have something similar, but don't advertise it.
The reason I say you're prescient is because I was poking around just today and found a shell command someone wrote that basically wgets a nagios web page displaying a list of hosts that looks something like
host1 is UP
host2 is DOWN
etc.
I have no idea how timely that data is or how it's created.
I don't generally track how quickly the hosts respond to the pings, I just needed to make sure they were up before I tried to ssh to them and execute the script that collected and reported the statistics. I was going to say I don't care, but now that you mention it, it does seem (and this is pure speculation based on anecdotal evidence) that the machines that have a higher load tend to take longer to reply and to return the stats. That info might be worth recording. Sort of like finding the best Counterstrike server!
I like computer programming because it's like Legos for the mind.
|