in reply to Re: comma delimited, syslog parsing
in thread comma delimited, syslog parsing

Allow me to rephrase:
  1. You have a map between downtime message and uptime message.
  2. You want to parse out the downtime message and put it somewhere.
  3. Later, when the corresponding uptime message arrives, you want to append that to the downtime message (parsed earlier).

Ok. That is a lot clearer, but still not clear enough. Here's a few problems:

  1. Let's say you have two SQL Server instances, and both go down at different times. Both will probably come back up at different times. How are you going to match the uptime message to the right downtime message? (One possible solution is in the next point.)
  2. Your datastore (the place you're storing errors in) ... that's getting more and more complex. But, all you really need is:
    (Error type) - (Who had the error) - (When did it happen) - (When was +it resolved)

    Many of my esteemed brothers will take me to task for this, but it sounds like you really want a database for this. Now, I'm not suggesting you use Oracle or whatever, but even something like DBD::SQLite would be very handy. This way, you can easily find where in your datastore your corresponding downtime message is. Also, it will help you organize what you're actually keeping in your datastore.

  3. Of course, figuring out how to match the two SQL Server downtime messages may be easy. Figuring out how to do that for Net Gateway messages may also be easy. Figuring out a way of doing it for both at the same time may not be so easy.

I'm not trying to discourage you. These are questions that are necessary (imho) to answer before you can really get something usable up and running. (specially if, if I assume right, you will want (or be asked for) reporting on these downtimes. Any non-trivial reporting usually is benefited by the use of databases.

------
We are the carpenters and bricklayers of the Information Age.

The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Replies are listed 'Best First'.
Re: Re: Re: comma delimited, syslog parsing
by jeff061 (Initiate) on Oct 13, 2003 at 16:05 UTC
    Not discouraging me at all. I have never really scripted or coded at all in my life, my boss is into scripting. Therefor he assigned this to me to try and get me more comfortable with scripting. I appreciate any questions asked that will allow me to help you guys help me.

    Anyways here is a more broad picture of what is happening. For the last few years the unix systems have had a shell script that parses the unix syslog files for specfic errors and events. These would then get sent to a unix server via email. This unix server would then generate reports and emails based on this data. I have been tasked to integrate windows machines into this solution(yes i am the NT Admin, let me apologize in advance). So i have to generate and email data to that unix machine in a way that it can read(which is specific to the formats i mentioned earlier). Once the logs are generated, formatted and sent to that unix server my hands are washed clean of the process, and i have no reason to maintain a DB of these results. It is all being stored and maintained on the Unix server.

    What you rephrased is accurate, however i have not yet mapped downtime events with uptime events. Thats purely logical at this point.

    The sql error was just an example, in reality it will be much more detailed, and will include the name of the instance. If there are any problems between getting uptime and downtime messages crossed this is something i can take care of through software(i'm using servers alive, Kiwi syslog and an event to syslog service).

    Basically every 15 minutes it will be scanning a syslog with roughly 20 lines of events. From this all downtimes need to be matched with an uptime, and if there is no uptime then that downtime event needs to be kept from being mailed until an uptime event is logged.

      Once the logs are generated, formatted and sent to that unix server my hands are washed clean of the process, and i have no reason to maintain a DB of these results. It is all being stored and maintained on the Unix server.

      Ok - so, let's say something goes down at 8am and comes back up at 11am the same day. What is the process by which you update the relevant record on the Unix box? What is the protocol? How do you tell it "Update THIS event with THIS information."? Once you have that answer, you can answer your question.

      I've got a feeling that it's going to (eventually) be something along these lines - you have an event with a given entity. You report to the Unix server "Entity ABCD had an event EFGH at such-and-such a time". It is up to the Unix server (who is the one with all the information) to correlate the various events for the entity ABCD. You should just be reporting "This entity, this event, this timestamp".

      ------
      We are the carpenters and bricklayers of the Information Age.

      The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

        Basically its like i said. In the example that you mentioned you would send something along the following lines:

        hostname, Oct 13 2003, 8:00:00, service is down, Error, Oct 13 2003, 12:00:00
        The first date and time being the downtime, the second being the uptime.

        Its all really static, i wish the setup on the Unix server was more capable. But you email that line(it can take multiple lines now, after i whined to the Unix guy) to the server. And once it is recieved it is saved to a file specific to that host name, in identical format. Periodically, on the Unix server, scripts are run to create reports on the downtime of each server, and the reason it was down(just the error).

        The Unix system does nothing beyond that, it does not correlate downtimes and uptimes beyond what i send it. I've fought to get this moved to the server to no avail.