Re: logfile parsing

Write a parser.

Look for "Start Query", remember the id as an open query and save the start time
read the SQL and save it associated with the id in a hash
when you reach an "End Query", compute the time it took and take the id out of your list of open queries
lather, rinse, repeat for all lines of the logfile

When you reach the end of the file, you should have a list of what finished and how long it took and what was still pending that had no matching "End Query" lines.

If you want more specific than that, show your effort and what code you've tried so far.

Comment on Re: logfile parsing

Replies are listed 'Best First'.
Re^2: logfile parsing by phoneguy (Novice) on Oct 26, 2006 at 15:36 UTC
This is what I have so far. I still need to figure out how I'm going to go through the $ismatched portion of my code...let alone if I'm doing this right. I'll be _really_ honest when I say that I don't have a clue as to what I'm doing. Programming isn't my first career choice...but, telephones are :) #!/usr/bin/perl -w use strict; my $processid; my $duration; my $ismatched; my $count; my $line; my $fo; my $filename; my $id; my $durid open(FO, $filename); while (defined ($line = <FO>)) { # Start if ($line =~ m/Start$/) { $count++; # Grab text from ID $processid =~ m/Id\:/; $ismatched[$processid] = 0; } if ($line =~ m/End$/ { $count--; $processid =~ m/ID:/; $ismatched[$processid] =1; $duration[$processid] =~ m/duration:/; if($duration > 300) { print "Look at Id: $processid -- took longer than it shoul +d.\n"; } } } # finding out processes need to go here. [download]	[reply] [d/l]
Re^3: logfile parsing by ptum (Priest) on Oct 26, 2006 at 16:51 UTC
That seems to be a good start, although I would probably use a hash (associative array) rather than a numerically-indexed array. Something like this (untested): use strict; my %ismatached = (); while (<FO>) { if (/Start Query \[ID:\s(\d+)\s\]/) { $ismatched{$1}++; } elsif (/End Query \[ID:\s(\d+)\s\]\s\[duration:\s(\d+)\s*\]/) +{ my $id = $1; $ismatched{$id}+= 2; my $duration = $2; if ($duration > 300) { print "Look at Id: $id -- took longer than it should.\n"; } } } foreach (sort keys %ismatched) { if ($ismatched{$_} == 1) { print "Process id $_ never ended.\n"; } elsif ($ismatched{$_} == 2) { print "No record of process id $_ ever starting.\n"; } else { delete $ismatched{$_}; } } print "There were a total of ", scalar(keys %ismatched), " unmatched + processes.\n"; [download] As you step through your log file, you capture the part of each line that you care about, and keep a record. Basically, you keep an entry in your hash for each process ID you encounter. If starts, it is incremented to a 1, and if it ends, it is incremented by 2. So when you are done, all the 1's are processes that started but never finished, all the 2's are processes that ended but never started, and all the 3's are processes that started and ended correctly. You could capture the SQL in much the same way, perhaps using another hash to associate it to the process ID, or having your original ismatched hash store duration and SQL information in a subordinate hash.	[reply] [d/l]
Re^4: logfile parsing by phoneguy (Novice) on Oct 26, 2006 at 17:42 UTC
I had to change the regexps to what I needed, but it works wonderfuly! Thank you so much!	[reply]
Re^3: logfile parsing by phoneguy (Novice) on Oct 26, 2006 at 16:57 UTC
Ok...I think I've figured out how to get my process ID: `if($line =~/Id:\s*(\d+)/){ $processid = $1; }` [download] Does that make sense?	[reply] [d/l]