phoneguy has asked for the wisdom of the Perl Monks concerning the following question:

I have a logfile that is somewhat unique in the fact that, well, it's a jumbled up mess. For example:
2006-10-25 18:20:20,751 DEBUG [stuff][][] End Query: [Id: 1] [duration +: 96] 2006-10-25 18:20:20,762 DEBUG [stuff][][] Start Query: [Id: 2] [Descri +ption: ] [debug sql] <SQL Here> <more SQL> <more SQL> 2006-10-25 18:20:20,763 DEBUG [stuff][][] End Query: [Id: 2] [duration +: 1] 2006-10-25 18:20:20,764 DEBUG [stuff][][] Checking status of UserServi +ce 2006-10-25 18:20:20,764 DEBUG [stuff][][] Starting IntegrationService 2006-10-25 18:20:20,764 DEBUG [stuff][][] Service Started: Integration +Service 2006-10-25 18:20:20,776 DEBUG [stuff][][] Start Query: [Id: 3] [Descri +ption: ] [debug sql] SELECT
I'd like to be able to write a logfile parser that will tell me which ID's haven't finished in a log file (when the server crashes), and what SQL statements are taking the longest time. What's the easiest way to do this?

Replies are listed 'Best First'.
Re: logfile parsing
by Fletch (Bishop) on Oct 26, 2006 at 15:17 UTC

    Write a parser.

    • Look for "Start Query", remember the id as an open query and save the start time
    • read the SQL and save it associated with the id in a hash
    • when you reach an "End Query", compute the time it took and take the id out of your list of open queries
    • lather, rinse, repeat for all lines of the logfile

    When you reach the end of the file, you should have a list of what finished and how long it took and what was still pending that had no matching "End Query" lines.

    If you want more specific than that, show your effort and what code you've tried so far.

      This is what I have so far. I still need to figure out how I'm going to go through the $ismatched portion of my code...let alone if I'm doing this right. I'll be _really_ honest when I say that I don't have a clue as to what I'm doing. Programming isn't my first career choice...but, telephones are :)
      #!/usr/bin/perl -w use strict; my $processid; my $duration; my $ismatched; my $count; my $line; my $fo; my $filename; my $id; my $durid open(FO, $filename); while (defined ($line = <FO>)) { # Start if ($line =~ m/Start$/) { $count++; # Grab text from ID $processid =~ m/Id\:/; $ismatched[$processid] = 0; } if ($line =~ m/End$/ { $count--; $processid =~ m/ID:/; $ismatched[$processid] =1; $duration[$processid] =~ m/duration:/; if($duration > 300) { print "Look at Id: $processid -- took longer than it shoul +d.\n"; } } } # finding out processes need to go here.

        That seems to be a good start, although I would probably use a hash (associative array) rather than a numerically-indexed array. Something like this (untested):

        use strict; my %ismatached = (); while (<FO>) { if (/Start Query \[ID:\s*(\d+)\s*\]/) { $ismatched{$1}++; } elsif (/End Query \[ID:\s*(\d+)\s*\]\s*\[duration:\s*(\d+)\s*\]/) +{ my $id = $1; $ismatched{$id}+= 2; my $duration = $2; if ($duration > 300) { print "Look at Id: $id -- took longer than it should.\n"; } } } foreach (sort keys %ismatched) { if ($ismatched{$_} == 1) { print "Process id $_ never ended.\n"; } elsif ($ismatched{$_} == 2) { print "No record of process id $_ ever starting.\n"; } else { delete $ismatched{$_}; } } print "There were a total of ", scalar(keys %ismatched), " unmatched + processes.\n";

        As you step through your log file, you capture the part of each line that you care about, and keep a record. Basically, you keep an entry in your hash for each process ID you encounter. If starts, it is incremented to a 1, and if it ends, it is incremented by 2. So when you are done, all the 1's are processes that started but never finished, all the 2's are processes that ended but never started, and all the 3's are processes that started and ended correctly.

        You could capture the SQL in much the same way, perhaps using another hash to associate it to the process ID, or having your original ismatched hash store duration and SQL information in a subordinate hash.

        Ok...I think I've figured out how to get my process ID:
        if($line =~/Id:\s*(\d+)/){ $processid = $1; }
        Does that make sense?
Re: logfile parsing
by madbombX (Hermit) on Oct 26, 2006 at 16:05 UTC
    First off, what program is generating this logfile? Someone may have already written a parser for this type of logfile if it isn't a custom rolled application. Try searching CPAN for the module.

    If it is a custom rolled app, then I would suggest writing an OO module that takes each line as a separate object, parses it and dumps it off to a function to handle each class/category of line and go from there. Aggregating the stats can become another part of the module.

      The program is a PBX Application that was written in-house, but, the person who wrote it is no longer around, and I'm not a perl <classification>, just someone who wanted to try it out and get my feet wet. Writing an OO module is probably a bit above and beyond the scope of my skillset.
Re: logfile parsing
by wolfger (Deacon) on Oct 27, 2006 at 16:21 UTC
    I have a logfile that is somewhat unique in the fact that, well, it's a jumbled up mess.

    And you think that's unique? Ha! Lucky you. In my experience, that's common.

    I would add something helpful, but I see you've already gotten what you need from another monk...