in reply to logfile parsing

Write a parser.

When you reach the end of the file, you should have a list of what finished and how long it took and what was still pending that had no matching "End Query" lines.

If you want more specific than that, show your effort and what code you've tried so far.

Replies are listed 'Best First'.
Re^2: logfile parsing
by phoneguy (Novice) on Oct 26, 2006 at 15:36 UTC
    This is what I have so far. I still need to figure out how I'm going to go through the $ismatched portion of my code...let alone if I'm doing this right. I'll be _really_ honest when I say that I don't have a clue as to what I'm doing. Programming isn't my first career choice...but, telephones are :)
    #!/usr/bin/perl -w use strict; my $processid; my $duration; my $ismatched; my $count; my $line; my $fo; my $filename; my $id; my $durid open(FO, $filename); while (defined ($line = <FO>)) { # Start if ($line =~ m/Start$/) { $count++; # Grab text from ID $processid =~ m/Id\:/; $ismatched[$processid] = 0; } if ($line =~ m/End$/ { $count--; $processid =~ m/ID:/; $ismatched[$processid] =1; $duration[$processid] =~ m/duration:/; if($duration > 300) { print "Look at Id: $processid -- took longer than it shoul +d.\n"; } } } # finding out processes need to go here.

      That seems to be a good start, although I would probably use a hash (associative array) rather than a numerically-indexed array. Something like this (untested):

      use strict; my %ismatached = (); while (<FO>) { if (/Start Query \[ID:\s*(\d+)\s*\]/) { $ismatched{$1}++; } elsif (/End Query \[ID:\s*(\d+)\s*\]\s*\[duration:\s*(\d+)\s*\]/) +{ my $id = $1; $ismatched{$id}+= 2; my $duration = $2; if ($duration > 300) { print "Look at Id: $id -- took longer than it should.\n"; } } } foreach (sort keys %ismatched) { if ($ismatched{$_} == 1) { print "Process id $_ never ended.\n"; } elsif ($ismatched{$_} == 2) { print "No record of process id $_ ever starting.\n"; } else { delete $ismatched{$_}; } } print "There were a total of ", scalar(keys %ismatched), " unmatched + processes.\n";

      As you step through your log file, you capture the part of each line that you care about, and keep a record. Basically, you keep an entry in your hash for each process ID you encounter. If starts, it is incremented to a 1, and if it ends, it is incremented by 2. So when you are done, all the 1's are processes that started but never finished, all the 2's are processes that ended but never started, and all the 3's are processes that started and ended correctly.

      You could capture the SQL in much the same way, perhaps using another hash to associate it to the process ID, or having your original ismatched hash store duration and SQL information in a subordinate hash.

        I had to change the regexps to what I needed, but it works wonderfuly! Thank you so much!
      Ok...I think I've figured out how to get my process ID:
      if($line =~/Id:\s*(\d+)/){ $processid = $1; }
      Does that make sense?