ropey has asked for the wisdom of the Perl Monks concerning the following question:

Hi There, I have a log file I need to parse and grab all the errors thrown which have been dumped, for instance a log file may look like..

[2008/10/14 10:57:18] [55] DEBUG Austin::API::vendor_api - Initialisin +g the API Galileo [2008/10/14 10:57:18] [41] DEBUG Austin::Suppliers::Common::Location:: +init - Setting Up Generic Locations [2008/10/14 11:15:22] [147] DEBUG Austin::Controller::default - $VAR1 += { 'Errors' => [ { 'ID' => 'noRates', 'Error' => 'No Rates for given dates/times', 'object' => '/var/website/modules/Austin/CB/ +Search.pm', 'Location' => undef, 'line' => 371 }, ] }; [2008/10/14 10:57:18] [245] DEBUG Austin::XV::Trip::set_time_zone - Ch +ecking timezone with agent UK [2008/10/14 10:57:18] [247] DEBUG Austin::XV::Trip::set_time_zone - Se +tting timezone to Europe/London [2008/10/14 11:15:22] [2008/10/14 11:15:22] [147] DEBUG Austin::Controller::default - $VAR1 += { 'Errors' => [ { 'ID' => 'noRates', 'Error' => 'No Rates for given dates/times', 'object' => '/var/website/modules/Austin/CB/ +Search.pm', 'Location' => undef, 'line' => 371 }, ] }; [2008/10/14 11:15:22]
If all the data was read in in one and sucked to a variable go I would just do something like...
while($data =~ /'Errors' => \[([^\]]+)/sgm) { my $match = qq|\$ref = [$1];|; my $ref; $ref = eval {$match}; print Dumper($ref); }

However as you would expect the file is going to be pretty huge and reading it in one hit isn't a option, and obviously I dont want that. I guess i could change the $/ = ';'; and iterate that way.... hmmmm any better ideas ?

Also be nice if I could pipe to it also so tail -f <LOG> | log_grepper.pl..

I look forward to hearing your improvements !

Replies are listed 'Best First'.
Re: Buffered read and matching
by jethro (Monsignor) on Oct 15, 2008 at 11:05 UTC

    Read the file line for line and use a flag to indicate whether you want to collect lines or not, you could also call it a simple state machine:

    my $errormatching=0; my @errortext=''; while (<>) { if (not $errormatching) { $errormatching=1 if m{^\s*'Errors' =>}; } else { if (not m{^\s*\]}) { push @errortext, $_; } else { process_error(@errortext); @errortext=(); $errormatching=0; } } }

    UPDATE: Fixed typo where I used [ instead of ] in the regex

      Right, I wanted to suggest some state machine along this too:

      #!/usr/bin/perl use strict; my @lines; sub flush { return unless grep(/'Errors'/, @_); print "Got:\n\t", join ("\t", @_), "\n"; } while (<>) { # trigger on last line of an error message flush(@lines,$_),@lines=(),next if m{^\s*\};\s*$}; # triggger on next timestamp flush(@lines), @lines=() if m{^\s*\[\d\d\d\d\/}; push(@lines, $_); } flush(@lines);
      However, all solutions I have seen so far trigger evaluation with the next timestamp only. When using tail -f, the error will be processed only when the next log-line with a time-stamp is encountered - which might happen some hours or days later. IF seeing an error ASAP is important to the OP, the evaluation should be triggered additionally by matching for }; or by some timeout-mechanism, e.g. using alarm().
      Update: final flush(...) added

        You might have been mislead by a bug in my solution where I used the wrong bracket for end of error detection (which makes the script not work at all), but it does process the error when ] is encountered, i.e. well before the next log-line.
Re: Buffered read and matching
by BrowserUk (Patriarch) on Oct 15, 2008 at 11:28 UTC

    Use $/ to skip over the bits you aren't interested in:

    #! perl -slw use strict; $/ = "'Errors'"; while( my $junk = <DATA> ) { local $/ = "\n["; my $errorData = <DATA>; last unless $errorData; chop $errorData; print "'Errors' $errorData"; } __DATA__ [2008/10/14 10:57:18] [55] DEBUG Austin::API::vendor_api - Initialisin +g the APIGalileo [2008/10/14 10:57:18] [41] DEBUG Austin::Suppliers::Common::Location:: +init - Setting Up Generic Locations [2008/10/14 11:15:22] [147] DEBUG Austin::Controller::default - $VAR1 += { 'Errors' => [ { 'ID' => 'noRates', 'Error' => 'No Rates for given dates/times', 'object' => '/var/website/modules/Austin/CB/ +Search.pm', 'Location' => undef, 'line' => 371 }, ] }; [2008/10/14 10:57:18] [245] DEBUG Austin::XV::Trip::set_time_zone - Ch +ecking timezone with agent UK [2008/10/14 10:57:18] [247] DEBUG Austin::XV::Trip::set_time_zone - Se +tting timezone to Europe/London [2008/10/14 11:15:22] [2008/10/14 11:15:22] [147] DEBUG Austin::Controller::default - $VAR1 += { 'Errors' => [ { 'ID' => 'noRates', 'Error' => 'No Rates for given dates/times', 'object' => '/var/website/modules/Austin/CB/ +Search.pm', 'Location' => undef, 'line' => 371 }, ] }; [2008/10/14 11:15:22]

    Outputs:

    c:\test>junk1 'Errors' => [ { 'ID' => 'noRates', 'Error' => 'No Rates for given dates/times', 'object' => '/var/website/modules/Austin/CB/ +Search.pm', 'Location' => undef, 'line' => 371 }, ] }; 'Errors' => [ { 'ID' => 'noRates', 'Error' => 'No Rates for given dates/times', 'object' => '/var/website/modules/Austin/CB/ +Search.pm', 'Location' => undef, 'line' => 371 }, ] };

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Buffered read and matching
by Anonymous Monk on Oct 15, 2008 at 11:03 UTC
    Don't eval your log files, its a bad idea.
Re: Buffered read and matching
by Anonymous Monk on Oct 15, 2008 at 11:20 UTC
    #!/usr/bin/perl -- use strict; use warnings; my $line = ""; OUTER: while (<DATA>) { if (/^\[/) { # could use \[\d{4} $line = $_; } INNER: my $i = 0; while (<DATA>) { if (/^\[/) { print $line, "\n" if $i; $line = $_; next OUTER; } $line .= $_; $i++; } } __DATA__ [2008/10/14 10:57:18] [55] DEBUG Austin::API::vendor_api - Initialisin +g the API Galileo [2008/10/14 10:57:18] [55] DEBUG Austin::API::vendor_api - Initialisin +g the API Galileo [2008/10/14 10:57:18] [55] DEBUG Austin::API::vendor_api - Initialisin +g the API Galileo [2008/10/14 10:57:18] [41] DEBUG Austin::Suppliers::Common::Location:: +init - Setting Up Generic Locations [2008/10/14 11:15:22] [147] DEBUG Austin::Controller::default - $VAR1 += { 'Errors' => [ { 'ID' => 'noRates', 'Error' => 'No Rates for given dates/times', 'object' => '/var/website/modules/Austin/CB/ +Search.pm', 'Location' => undef, 'line' => 371 }, ] }; [2008/10/14 10:57:18] [245] DEBUG Austin::XV::Trip::set_time_zone - Ch +ecking timezone with agent UK [2008/10/14 10:57:18] [247] DEBUG Austin::XV::Trip::set_time_zone - Se +tting timezone to Europe/London [2008/10/14 11:15:22] [2008/10/14 11:15:22] [147] DEBUG Austin::Controller::default - $VAR1 += { 'Errors' => [ { 'ID' => 'noRates', 'Error' => 'No Rates for given dates/times', 'object' => '/var/website/modules/Austin/CB/ +Search.pm', 'Location' => undef, 'line' => 371 }, ] }; [2008/10/14 11:15:22]
    Also worth investigating is sliding window technique, ex Matching in huge files
Re: Buffered read and matching
by Wiggins (Hermit) on Oct 15, 2008 at 14:10 UTC
    a comment on
    "Also be nice if I could pipe to it also so tail -f <LOG> | log_grepper.pl.. "
    I use the following sample code structure to read files in a 'tail' ing fashion:
    #!/usr/bin/perl -w my $targetF = "/var/log/local2"; for(;;) { open ML, "<$targetF" || die $!; #seek(ML, 0, 2); # to EOF - Not, process from beginning for (;;){ while (<ML>){ another_line(); } sleep 2; seek(ML, 0, 1); # reset end-of-file error } # dropping here with the 'last' should cause # the file being monitored to be closed and reopened close (ML); } sub another_line { # $_ has the line # Check for target phrases return; }
    (I can't remember which book I found this in, but I find it very useful.)

    But any cyclically incremental reading of the file adds the need for accumulation of multi-line events, handling partial (incomplete) lines, and all the other 'streaming' techniques.

Re: Buffered read and matching
by educated_foo (Vicar) on Oct 15, 2008 at 16:47 UTC
    Use Sys::Mmap, map the file, and treat it as a string. The OS is smart enough to deal with it.