in reply to Perl for monitoring windows servers

I would recommend that you write the basic skeleton of the monitoring application yourself. Then find snippets of code from the Internet that perform the subroutines that you need, like checking for disk space.

First a little background. You must decide which method of monitoring you want to implement. There are three basic types of monitoring, Remote, By Agent, and Invasive. Remote is the easiest because you periodically ping the servers to see if they are alive and attempt to see if other services are running also. The downside to the Remote method is that it does not give you much detail like disk space and other critical services that are not viewable to the outside world. The second method is By Agent. An agent is a small software program that has one or more subroutines that check vital services and functions, collects that information, and then attempts to send this data back to the Controlling Application. The Controlling Application is the software program that collects all the incoming Agent data, stores it, sends the Agent software control commands, and allows reporting on the data received. The last method is the most problematic and that is the Invasive method. The way this method is implemented is that you must insert Agent like subroutines in all of your applications that are running on each server and have that data sent back to the Controlling Application.

I recommend the By Agent method. But because of your short time table, keep the subroutines of the Agent short and sweet.

Here is the pseudo code for the Controlling Application

Here is the pseudo code for the Agent

Here are some notes about the Agent program. A subroutine that I would have the Agent perform is what is called a low tech test to see if the server is alive. Each time that the Agent performs any other monitoring function, I would have the subroutine write today’s date and time to a text file called pulse.txt. A simple test to see if the Agent is frozen up or the Operating System not allowing writing to a file can be verified by reading the pulse.txt and seeing if the last date and time stamp is close to the current date and time. When a server is rebooted, the Agent will detect that the pulse.txt is not current and will attempt to send a message to the Controller that it detected this problem. Also in worst case, the Controller can attempt to read pulse.txt remotely and see if there is a problem. Some of the basic Agent subroutines that you will create are disk space, services running, and CPU load. I would also include event log monitoring, which there are libraries in Perl already written that you can use to monitor your event logs for specific events. You can create subroutines that monitor Web servers, Application servers, and Network servers. The types of monitoring are endless, just be sure that you have a good business case to justify it. At first I would stick to the following subroutines, Pulse, Disk space, CPU load, critical Event Log events, Memory usage, and critical Services running. Later I would add Performance Monitoring (PERFMON) Log events, and check for runaway or hung processes.

The biggest technical issue to be decided is how the data packets will be sent from the Agent to the Controller. Some applications use SMTP, email, SNMP, create WAP pages and others COM/DCOM messages. Your solution will be based upon your network environment, its stability, how much control you have over it, and what communication methods are available to you. This issue will make or break this project so research it well and come up with a solution that you can live with. Include in your decision making how you will detect missing data packets, data packets not sent, and other worst case scenarios.

I would create a rules based alert system that you can easily transmit to your Agents from the Controller. For example a rule about disk space would be that an alert would be sent if the disk space on any drive dropped below 5%. I would try to keep the rules the same for all servers; otherwise you will be maintaining a different rule set for each server. Come to think of it, maybe you could come up with a rule set based upon the type of server functions it performs. So you would have a rule set for database servers and a different one for email servers.

I would also have a version number in a file, like agentver.txt, that each Agent uses. Then the Agent can report back to the Controller what version of the Agent it is and what version of the rule set that it is using.

Here are some notes about the Controller program. The Controller will have two types of rule sets, Action rules and Ping rules. The Ping rules determine how often the servers are pinged, retry time after one failed ping, multiple failed pings etc. The Action rule set spells out what to do upon each event that the Agent reports back on. So if an agent reports a disk space problem of 5%, then the rule determines what to do next, including creating a log of alerts, sending emails, text paging, and other actions. The Action rule for disk space problems of less than 2% can be different than for 5%.

Reporting directly from the Controller can slow the perform of the Controller to unacceptable levels. I would recommend replicating the master data packet log on the Controller to a Reporting server. The Reporting server can then perform analysis and reports and graphs of the data packets ad hoc. You will have to work out the issues of log rotation, retention, and data storage issues.

Richard

There are three types of people in this world, those that can count and those that cannot. Anon

  • Comment on Re: Perl for monitoring windows servers

Replies are listed 'Best First'.
Re: Re: Perl for monitoring windows servers
by kpm (Novice) on May 29, 2003 at 20:30 UTC
    Helo sir I am very much impressed and moved by your sincerety in writing this reply. Thank you very much for your ideas. Kindly do not mistake me for taking time to reply for your help. Thank you Karthik Param kpm@roberts-companies.com