tuakilan has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I try to write a perl script that reads a log file, locate records which has same id number, but the time difference is more than 1 hour within the same day.

Herewith are the log file which was generated by a close source C program which was beyond my knowledge

ReadLog.txt

<b>2008-Feb-01 00:00:02 UTC (GMT +0000) - id = 000000001, Status = fai +led</b> 2008-Feb-01 00:10:02 UTC (GMT +0000) - id = 000000002, Status = succes +s <b>2008-Feb-01 00:20:02 UTC (GMT +0000) - id = 000000003, Status = fai +led</b> 2008-Feb-01 00:30:02 UTC (GMT +0000) - id = 000000004, Status = succes +s 2008-Feb-01 00:40:02 UTC (GMT +0000) - id = 000000001, Status = succes +s 2008-Feb-01 00:50:02 UTC (GMT +0000) - id = 000000001, Status = succes +s <b>2008-Feb-01 01:00:02 UTC (GMT +0000) - id = 000000001, Status = fai +led</b> 2008-Feb-01 01:10:02 UTC (GMT +0000) - id = 000000007, Status = succes +s <b>2008-Feb-01 01:20:02 UTC (GMT +0000) - id = 000000003, Status = fai +led</b> 2008-Feb-01 01:30:02 UTC (GMT +0000) - id = 000000009, Status = succes +s

ResultLog.txt

ID last attempted attempts ============================================================ 000000001 2008-Feb-01 01:00:02 1 000000003 2008-Feb-01 01:20:02 1


I tried the following but the output wasn't desirable.

use strict; use warnings; use DateTime::Format::Strptime; # my $Strp = new DateTime::Format::Strptime(pattern => '%Y-%b-%d % +T',); my $Strp = new DateTime::Format::Strptime(pattern => '%Y-%b-%d %T' +, on_error => 'croak'); my $infile = 'ReadLog.txt'; my $outfile = 'ReportLog.txt'; my($fh_out, $fh); my %lookup; my $status = 'failed'; my $time_delta = 3600; # seconds = 1 hour open($fh_out, '>', $outfile) or die "Could not open outfile: $!"; open($fh, '<', $infile) or die "Could not open logfile: $!"; while (<$fh>) { next unless /$status/; $_ =~ m/^(.*) UTC.*refs = (\d+)$/; # my $dt = $Strp->parse_datetime("$1"); unless (my $dt = $Strp->parse_datetime($1)) { warn "invalid datetime: $1"; next; } my $timestamp = $dt ->epoch(); my $refs = "$2"; if ( defined($lookup{$refs}) && $lookup{$refs} + $time_delta <= $t +imestamp ) { print $fh_out "REFS $refs: occurrences at " . $lookup{$refs} . + "and $timestamp \n"; print "REFS $refs: occurrences at " . $lookup{$refs} . " and $ +timestamp \n"; } $lookup{$refs} = $timestamp; } close $fh_out; #close $fh;

Replies are listed 'Best First'.
Re: Logfile analysis : How to differentiate records with timestamp
by moritz (Cardinal) on Mar 11, 2008 at 08:09 UTC
    The output wasn't desirable because your program doesn't even compile.

    You haven't declared $channel and $dt (the latter is declared in a comment).

    The missing regex $channel should be your first shot, if you've got that, you can use $1 to create $dt

Re: Logfile analysis : How to differentiate records with timestamp
by hipowls (Curate) on Mar 11, 2008 at 10:01 UTC

    This produces the required output but I'm not convinced that it does what is required.

    • "last attempted" is actually the time of the first attempt in that period
    • "attempts" is the number of additional failed attempts, if only one failure occurs it will be logged as 0 attempts.
    Anyway they're not my problems, I had fun with the regex;-)
    #!/net/perl/5.10.0/bin/perl use strict; use warnings; use 5.010_000; use DateTime::Format::Strptime; use Readonly; Readonly my $delta => 60 * 60; my %stats; my $log_parser = qr{ \A (?<time_stamp> \d{4}-\p{IsAlpha}{3}-\d\d [ ] \d\d:\d\d:\d\d ) .* [ ] - [ ] id [ ] = [ ] (?<id> \d+ ), [ ] Status [ ] = [ ] (?<status> \w+ ) }msx; my $date_parser = new DateTime::Format::Strptime( pattern => '%Y-%b-%d %T', on_error => 'croak', ) or die "Can't create date parser: $!\n"; while ( my $line = <DATA> ) { if ( $line =~ /$log_parser/ ) { my ( $time_stamp, $id, $status ) = @+{qw(time_stamp id status) +}; next if $status eq 'success'; my $date = $date_parser->parse_datetime($time_stamp); || $stats{$id}[-1][0]->subtract_datetime_absolute($date)-> +seconds > $delta ) { push @{ $stats{$id} }, [ $date, $time_stamp, 0 ]; } else { ++$stats{$id}[-1][2]; } } } if ( scalar keys %stats ) { say 'ID last attempted attempts'; say '=' x 44; foreach my $id ( sort keys %stats ) { foreach my $time ( @{ $stats{$id} } ) { printf "%s %s %8d\n", $id, @{$time}[ 1, 2 ]; } } } __DATA__ 2008-Feb-01 00:00:02 UTC (GMT +0000) - id = 000000001, Status = failed 2008-Feb-01 00:10:02 UTC (GMT +0000) - id = 000000002, Status = succes +s 2008-Feb-01 00:20:02 UTC (GMT +0000) - id = 000000003, Status = failed 2008-Feb-01 00:30:02 UTC (GMT +0000) - id = 000000004, Status = succes +s 2008-Feb-01 00:40:02 UTC (GMT +0000) - id = 000000001, Status = succes +s 2008-Feb-01 00:50:02 UTC (GMT +0000) - id = 000000001, Status = succes +s 2008-Feb-01 01:00:02 UTC (GMT +0000) - id = 000000001, Status = failed 2008-Feb-01 01:10:02 UTC (GMT +0000) - id = 000000007, Status = succes +s 2008-Feb-01 01:20:02 UTC (GMT +0000) - id = 000000003, Status = failed 2008-Feb-01 01:30:02 UTC (GMT +0000) - id = 000000009, Status = succes +s
    Output
    ID last attempted attempts ============================================ 000000001 2008-Feb-01 00:00:02 1 000000003 2008-Feb-01 00:20:02 1

      When i execute this script, i got the following error
      $ perl -wc a300.pl Sequence (?<t...) not recognized in regex; marked by <-- HERE in m/ \A (?<t <-- HERE ime_stamp> \d{4}-\p{IsAlpha}{3}-\d\d [ ] \d\d:\d\d:\ +d\d ) .* [ ] - [ ] id [ ] = [ ] (?<id> \d+ ), [ ] Status [ ] = [ ] (?<status> \w+ ) / at a300.pl line 14.

        I'm surprised it didn't exit with an error message similar to

        Perl v5.10.0 required--this is only v5.8.8, stopped at Perl-1.pl line +6. BEGIN failed--compilation aborted at Perl-1.pl line 6.
        Ahh your error occurred on line 14 whereas the regex in my script is on line 16, I guess you removed the use 5.010_000;

        If you are running an older version of perl you'll need to remove the named captures & use regular captures. It's just removeing ?<name> and using numbered captures to initialize variables. You'll also need to replace the says with print ... "\n"