Not discouraging me at all. I have never really scripted or coded at all in my life, my boss is into scripting. Therefor he assigned this to me to try and get me more comfortable with scripting. I appreciate any questions asked that will allow me to help you guys help me.
Anyways here is a more broad picture of what is happening. For the last few years the unix systems have had a shell script that parses the unix syslog files for specfic errors and events. These would then get sent to a unix server via email. This unix server would then generate reports and emails based on this data. I have been tasked to integrate windows machines into this solution(yes i am the NT Admin, let me apologize in advance). So i have to generate and email data to that unix machine in a way that it can read(which is specific to the formats i mentioned earlier). Once the logs are generated, formatted and sent to that unix server my hands are washed clean of the process, and i have no reason to maintain a DB of these results. It is all being stored and maintained on the Unix server.
What you rephrased is accurate, however i have not yet mapped downtime events with uptime events. Thats purely logical at this point.
The sql error was just an example, in reality it will be much more detailed, and will include the name of the instance. If there are any problems between getting uptime and downtime messages crossed this is something i can take care of through software(i'm using servers alive, Kiwi syslog and an event to syslog service).
Basically every 15 minutes it will be scanning a syslog with roughly 20 lines of events. From this all downtimes need to be matched with an uptime, and if there is no uptime then that downtime event needs to be kept from being mailed until an uptime event is logged. | [reply] |
Once the logs are generated, formatted and sent to that unix server my hands are washed clean of the process, and i have no reason to maintain a DB of these results. It is all being stored and maintained on the Unix server.
Ok - so, let's say something goes down at 8am and comes back up at 11am the same day. What is the process by which you update the relevant record on the Unix box? What is the protocol? How do you tell it "Update THIS event with THIS information."? Once you have that answer, you can answer your question.
I've got a feeling that it's going to (eventually) be something along these lines - you have an event with a given entity. You report to the Unix server "Entity ABCD had an event EFGH at such-and-such a time". It is up to the Unix server (who is the one with all the information) to correlate the various events for the entity ABCD. You should just be reporting "This entity, this event, this timestamp".
------ We are the carpenters and bricklayers of the Information Age. The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6 Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.
| [reply] |
Basically its like i said. In the example that you mentioned you would send something along the following lines:
hostname, Oct 13 2003, 8:00:00, service is down, Error, Oct 13 2003, 12:00:00
The first date and time being the downtime, the second being the uptime.
Its all really static, i wish the setup on the Unix server was more capable. But you email that line(it can take multiple lines now, after i whined to the Unix guy) to the server. And once it is recieved it is saved to a file specific to that host name, in identical format. Periodically, on the Unix server, scripts are run to create reports on the downtime of each server, and the reason it was down(just the error).
The Unix system does nothing beyond that, it does not correlate downtimes and uptimes beyond what i send it. I've fought to get this moved to the server to no avail.
| [reply] |