Hello, All -
First-time Perl user here. I have been tasked with creating a script that parses interlaced log files, assembles transactions from multiple lines, and then inserts them into a DB.
The complications:
1) The logs will be cut and dumped onto a box and my script is supposed to continuously parse them, insert the transactions into an Oracle DB, and move the logs to an archive.
2) Each "transaction" is composed of multiple lines.
3) The logs contain outputs from multiple threads, which are interlaced. Fortunately, the thread ID is included on each line.
4) The logs are expected to be large (250K+ transactions, potentially millions of lines or more).
The lines are formatted as such:
08/25/2009 11:29:03.991 (30)Parsing request 08/25/2009 11:29:03.991 (30)---------------------- 08/25/2009 11:29:03.991 (30)Authentication Request 08/25/2009 11:29:03.991 (30)Received From: ip=XXX port=XXX
The thread ID is a hexadecimal value within the parentheses.
Here is my code so far:
#!/bin/perl -w use strict; my @files = <*.log>; foreach my $file (@files) { my %threads = (); # get all the unique thread IDs for lines containing transaction da +ta based on the headers in one file # and put them in hash %threads open(LOGFILE,$file) or warn("Could not open log file."); while (<LOGFILE>) { if ( m/.{10}\s.{12}\s(.\w+.)Authentication Request/ | m/.{10}\s. +{12}\s(.\w+.)Authentication Response/ | m/.{10}\s.{12}\s(.\w+.)Accoun +ting Request/ ) { unless (exists $threads{ $1 }) { $threads{ $1 } = '1'; # '1' just to put something in there } } } close(LOGFILE) or warn("Could not close log file: $file."); # loop through the file once for each key in hash %threads foreach my $thread (keys (%threads)) { open(LOGFILE,$file) or warn("Could not open log file."); TRANSACTION: while (<LOGFILE>) { my %transaction = (); # if a line is found with the current thread ID and Authentic +ation Request header... if ( m/(.{10}\s.{12})\s\($thread\)Authentication Request/ ) { $transaction{ 'timestamp' } = $1; $transaction{ 'thread' } = $thread; $transaction{ 'type' } = "Request"; # read each subsequent line to build transaction while (<LOGFILE>) { if ( m/.{10}\s.{12}\s\($thread\)Acct-Session-Id : Strin +g Value = (.*$)/ ) { $transaction{ 'Acct-Session-Id' } = $1; } elsif ( m/.{10}\s.{12}\s\($thread\)User-Name : String + Value = (.*$)/ ) { $transaction{ 'User-Name' } = $1; } elsif ( m/(.{10}\s.{12})\s\($thread\)Authentication R +equest/ | m/(.{10}\s.{12})\s\($thread\)Authentication Response/ | m/( +.{10}\s.{12})\s\($thread\)Accounting Request/) { # if a new transaction header is found, do something + with the transaction and redo the line print map { "$_ => $transaction{$_}\n" } keys %trans +action; redo TRANSACTION; } } print map { "$_ => $transaction{$_}\n" } keys %transaction +; } elsif ( m/(.{10}\s.{12})\s\($thread\)Authentication Respons +e/ ) { # or Authentication Response... $transaction{ 'timestamp' } = $1; $transaction{ 'thread' } = $thread; $transaction{ 'type' } = "Reponse"; # read each subsequent line to build transaction while (<LOGFILE>) { if ( m/.{10}\s.{12}\s\($thread\)Acct-Session-Id : Strin +g Value = (.*$)/ ) { $transaction{ 'Acct-Session-Id' } = $1; } elsif ( m/.{10}\s.{12}\s\($thread\)User-Name : String + Value = (.*$)/ ) { $transaction{ 'User-Name' } = $1; } elsif ( m/(.{10}\s.{12})\s\($thread\)Authentication R +equest/ | m/(.{10}\s.{12})\s\($thread\)Authentication Response/ | m/( +.{10}\s.{12})\s\($thread\)Accounting Request/) { # if a new transaction header is found, do something + with the transaction and redo the line print map { "$_ => $transaction{$_}\n" } keys %trans +action; redo TRANSACTION; } } print map { "$_ => $transaction{$_}\n" } keys %transaction +; } elsif ( m/(.{10}\s.{12})\s\($thread\)Accounting Request/ ) +{ # or Accounting Request... $transaction{ 'timestamp' } = $1; $transaction{ 'thread' } = $thread; $transaction{ 'type' } = "Accounting"; # read each subsequent line to build transaction LINE: while (<LOGFILE>) { if ( m/.{10}\s.{12}\s\($thread\)Acct-Session-Id : Strin +g Value = (.*$)/ ) { $transaction{ 'Acct-Session-Id' } = $1; } elsif ( m/.{10}\s.{12}\s\($thread\)User-Name : String + Value = (.*$)/ ) { $transaction{ 'User-Name' } = $1; } elsif ( m/(.{10}\s.{12})\s\($thread\)Authentication R +equest/ | m/(.{10}\s.{12})\s\($thread\)Authentication Response/ | m/( +.{10}\s.{12})\s\($thread\)Accounting Request/) { # if a new transaction header is found, do something + with the transaction and redo the line print map { "$_ => $transaction{$_}\n" } keys %trans +action; redo TRANSACTION; } } print map { "$_ => $transaction{$_}\n" } keys %transaction +; } } close(LOGFILE) or warn("Could not close log file: $file."); } }
The code opens a file, scans for thread IDs from lines containing transaction headers, and stores them as a hash. Then it loops through the file as many times as there are thread IDs in the hash and assembles the transactions.
The reason there are individual loops for each transaction type is because the transactions actually contain different data, and this code is incomplete for now.
I'm concerned that reading each log file one time per thread ID is going to be too slow. I'm also not sure if I should insert individual transactions as they are assembled or store them all in an array and insert them all at once. Currently the code is just printing each transaction.
Thank you in advance for any assistance.
In reply to Interlaced log parser by tzen
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |