Buffered read and matching

ropey has asked for the wisdom of the Perl Monks concerning the following question:

Hi There, I have a log file I need to parse and grab all the errors thrown which have been dumped, for instance a log file may look like..

[2008/10/14 10:57:18] [55] DEBUG Austin::API::vendor_api - Initialisin
+g the API Galileo
[2008/10/14 10:57:18] [41] DEBUG Austin::Suppliers::Common::Location::
+init - Setting Up Generic Locations
[2008/10/14 11:15:22] [147] DEBUG Austin::Controller::default - $VAR1 
+= {
          'Errors' => [
                        {
                          'ID' => 'noRates',
                          'Error' => 'No Rates for given dates/times',
                          'object' => '/var/website/modules/Austin/CB/
+Search.pm',
                          'Location' => undef,
                          'line' => 371
                        },
                      ]
        };
[2008/10/14 10:57:18] [245] DEBUG Austin::XV::Trip::set_time_zone - Ch
+ecking timezone with agent UK
[2008/10/14 10:57:18] [247] DEBUG Austin::XV::Trip::set_time_zone - Se
+tting timezone to Europe/London
[2008/10/14 11:15:22]
[2008/10/14 11:15:22] [147] DEBUG Austin::Controller::default - $VAR1 
+= {
          'Errors' => [
                        {
                          'ID' => 'noRates',
                          'Error' => 'No Rates for given dates/times',
                          'object' => '/var/website/modules/Austin/CB/
+Search.pm',
                          'Location' => undef,
                          'line' => 371
                        },
                      ]
        };
[2008/10/14 11:15:22]
[download]

If all the data was read in in one and sucked to a variable go I would just do something like...

while($data =~ /'Errors' => \[([^\]]+)/sgm) {
    my $match = qq|\$ref = [$1];|;
    my $ref;
    $ref = eval {$match};
    print Dumper($ref);

}
[download]

However as you would expect the file is going to be pretty huge and reading it in one hit isn't a option, and obviously I dont want that. I guess i could change the $/ = ';'; and iterate that way.... hmmmm any better ideas ?

Also be nice if I could pipe to it also so tail -f <LOG> | log_grepper.pl..

I look forward to hearing your improvements !

Comment on Buffered read and matching Select or Download Code

Replies are listed 'Best First'.
Re: Buffered read and matching by jethro (Monsignor) on Oct 15, 2008 at 11:05 UTC
Read the file line for line and use a flag to indicate whether you want to collect lines or not, you could also call it a simple state machine: `my $errormatching=0; my @errortext=''; while (<>) { if (not $errormatching) { $errormatching=1 if m{^\s'Errors' =>}; } else { if (not m{^\s\]}) { push @errortext, $_; } else { process_error(@errortext); @errortext=(); $errormatching=0; } } }` [download] UPDATE: Fixed typo where I used `[` instead of `]` in the regex	[reply] [d/l] [select]
Re^2: Buffered read and matching by Perlbotics (Archbishop) on Oct 15, 2008 at 12:35 UTC
Right, I wanted to suggest some state machine along this too: `#!/usr/bin/perl use strict; my @lines; sub flush { return unless grep(/'Errors'/, @_); print "Got:\n\t", join ("\t", @_), "\n"; } while (<>) { # trigger on last line of an error message flush(@lines,$_),@lines=(),next if m{^\s\};\s$}; # triggger on next timestamp flush(@lines), @lines=() if m{^\s\[\d\d\d\d\/}; push(@lines, $_); } flush(@lines);` [download] However*, all solutions I have seen so far trigger evaluation with the next timestamp only. When using `tail -f`, the error will be processed only when the next log-line with a time-stamp is encountered - which might happen some hours or days later. IF seeing an error ASAP is important to the OP, the evaluation should be triggered additionally by matching for `};` or by some timeout-mechanism, e.g. using `alarm()`. Update: final `flush(...)` added	[reply] [d/l] [select]
Re^3: Buffered read and matching by jethro (Monsignor) on Oct 15, 2008 at 13:41 UTC
You might have been mislead by a bug in my solution where I used the wrong bracket for end of error detection (which makes the script not work at all), but it does process the error when `]` is encountered, i.e. well before the next log-line.	[reply] [d/l]
Re: Buffered read and matching by BrowserUk (Patriarch) on Oct 15, 2008 at 11:28 UTC
Use `$/` to skip over the bits you aren't interested in: #! perl -slw use strict; $/ = "'Errors'"; while( my $junk = <DATA> ) { local $/ = "\n["; my $errorData = <DATA>; last unless $errorData; chop $errorData; print "'Errors' $errorData"; } __DATA__ [2008/10/14 10:57:18] [55] DEBUG Austin::API::vendor_api - Initialisin +g the APIGalileo [2008/10/14 10:57:18] [41] DEBUG Austin::Suppliers::Common::Location:: +init - Setting Up Generic Locations [2008/10/14 11:15:22] [147] DEBUG Austin::Controller::default - $VAR1 += { 'Errors' => [ { 'ID' => 'noRates', 'Error' => 'No Rates for given dates/times', 'object' => '/var/website/modules/Austin/CB/ +Search.pm', 'Location' => undef, 'line' => 371 }, ] }; [2008/10/14 10:57:18] [245] DEBUG Austin::XV::Trip::set_time_zone - Ch +ecking timezone with agent UK [2008/10/14 10:57:18] [247] DEBUG Austin::XV::Trip::set_time_zone - Se +tting timezone to Europe/London [2008/10/14 11:15:22] [2008/10/14 11:15:22] [147] DEBUG Austin::Controller::default - $VAR1 += { 'Errors' => [ { 'ID' => 'noRates', 'Error' => 'No Rates for given dates/times', 'object' => '/var/website/modules/Austin/CB/ +Search.pm', 'Location' => undef, 'line' => 371 }, ] }; [2008/10/14 11:15:22] [download] Outputs: `c:\test>junk1 'Errors' => [ { 'ID' => 'noRates', 'Error' => 'No Rates for given dates/times', 'object' => '/var/website/modules/Austin/CB/ +Search.pm', 'Location' => undef, 'line' => 371 }, ] }; 'Errors' => [ { 'ID' => 'noRates', 'Error' => 'No Rates for given dates/times', 'object' => '/var/website/modules/Austin/CB/ +Search.pm', 'Location' => undef, 'line' => 371 }, ] };` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re: Buffered read and matching by Anonymous Monk on Oct 15, 2008 at 11:03 UTC
Don't eval your log files, its a bad idea.	[reply]
Re: Buffered read and matching by Anonymous Monk on Oct 15, 2008 at 11:20 UTC
#!/usr/bin/perl -- use strict; use warnings; my $line = ""; OUTER: while (<DATA>) { if (/^\[/) { # could use \[\d{4} $line = $_; } INNER: my $i = 0; while (<DATA>) { if (/^\[/) { print $line, "\n" if $i; $line = $_; next OUTER; } $line .= $_; $i++; } } __DATA__ [2008/10/14 10:57:18] [55] DEBUG Austin::API::vendor_api - Initialisin +g the API Galileo [2008/10/14 10:57:18] [55] DEBUG Austin::API::vendor_api - Initialisin +g the API Galileo [2008/10/14 10:57:18] [55] DEBUG Austin::API::vendor_api - Initialisin +g the API Galileo [2008/10/14 10:57:18] [41] DEBUG Austin::Suppliers::Common::Location:: +init - Setting Up Generic Locations [2008/10/14 11:15:22] [147] DEBUG Austin::Controller::default - $VAR1 += { 'Errors' => [ { 'ID' => 'noRates', 'Error' => 'No Rates for given dates/times', 'object' => '/var/website/modules/Austin/CB/ +Search.pm', 'Location' => undef, 'line' => 371 }, ] }; [2008/10/14 10:57:18] [245] DEBUG Austin::XV::Trip::set_time_zone - Ch +ecking timezone with agent UK [2008/10/14 10:57:18] [247] DEBUG Austin::XV::Trip::set_time_zone - Se +tting timezone to Europe/London [2008/10/14 11:15:22] [2008/10/14 11:15:22] [147] DEBUG Austin::Controller::default - $VAR1 += { 'Errors' => [ { 'ID' => 'noRates', 'Error' => 'No Rates for given dates/times', 'object' => '/var/website/modules/Austin/CB/ +Search.pm', 'Location' => undef, 'line' => 371 }, ] }; [2008/10/14 11:15:22] [download] Also worth investigating is sliding window technique, ex Matching in huge files	[reply] [d/l]
Re: Buffered read and matching by Wiggins (Hermit) on Oct 15, 2008 at 14:10 UTC
a comment on "Also be nice if I could pipe to it also so tail -f <LOG> \| log_grepper.pl.. " I use the following sample code structure to read files in a 'tail' ing fashion: `#!/usr/bin/perl -w my $targetF = "/var/log/local2"; for(;;) { open ML, "<$targetF" \|\| die $!; #seek(ML, 0, 2); # to EOF - Not, process from beginning for (;;){ while (<ML>){ another_line(); } sleep 2; seek(ML, 0, 1); # reset end-of-file error } # dropping here with the 'last' should cause # the file being monitored to be closed and reopened close (ML); } sub another_line { # $_ has the line # Check for target phrases return; }` [download] (I can't remember which book I found this in, but I find it very useful.) But any cyclically incremental reading of the file adds the need for accumulation of multi-line events, handling partial (incomplete) lines, and all the other 'streaming' techniques.	[reply] [d/l]
Re: Buffered read and matching by educated_foo (Vicar) on Oct 15, 2008 at 16:47 UTC
Use Sys::Mmap, map the file, and treat it as a string. The OS is smart enough to deal with it.	[reply]