tuakilan has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, Thanks to hipowls for the 1st part of the tasks, and comes the next part where i am trying to figure out how to do it

1. read a log file
2. search the logfile.txt for a specific string, in this case the string "TWO"
3. locate records where refs number are identical and time difference is more than 1 hour ( eg record 1 and record 8 )
4. based on the above 3 criteria, create a report file, as how many occourance the scenario happened.

2008-Jan-06 00:00:01 UTC (GMT +0000) - Poll: channel = ONE, refs = 595 +166299 2007-Jan-06 00:00:01 UTC (GMT +0000) - Poll: channel = TWO, refs = 595 +159906 2007-Jan-06 00:00:01 UTC (GMT +0000) - Poll: channel = THREE, refs = 6 +59975924 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = ONE, refs = 595 +148941 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = TWO, refs = 595 +131400 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = THREE, refs = 6 +59975924 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = ONE, refs = 595 +159906 2007-Jan-06 01:00:05 UTC (GMT +0000) - Poll: channel = ONE, refs = 595 +166299 2007-Jan-06 01:00:06 UTC (GMT +0000) - Poll: channel = TWO, refs = 595 +131400 2007-Jan-06 01:00:06 UTC (GMT +0000) - Poll: channel = TWO, refs = 659 +975924 2007-Jan-06 01:00:07 UTC (GMT +0000) - Poll: channel = THREE, refs = 5 +95148941
Thanks !

Replies are listed 'Best First'.
Re: comb a logfile for time diff and occourance
by svenXY (Deacon) on Feb 18, 2008 at 09:55 UTC
    Hi,
    elaborating on my solution to your previous question, this works for me:
    #!/usr/bin/perl use strict; use warnings; use DateTime::Format::Strptime; my $Strp = new DateTime::Format::Strptime( pattern => '%Y-%b-%d %T', ); my $infile = 'b2bclient-gateway-heartbeat.log'; my $outfile = 'results.txt'; my($fh_out, $fh); my %lookup; my $channel = 'TWO'; my $time_delta = 3600; # seconds = 1 hour open($fh_out, '>', $outfile) or die "Could not open outfile: $!"; # open($fh, '<', $infile) or die "Could not open logfile: $!"; # while (<$fh>) { while (<DATA>) { next unless /$channel/; $_ =~ m/^(.*) UTC.*refs = (\d+)$/; my $dt = $Strp->parse_datetime($1); my $timestamp = $dt->epoch(); my $refs = $2; if ( defined($lookup{$refs}) && $lookup{$refs} + $time_delta <= $t +imestamp ) { print $fh_out "REFS $refs: occurrences at " . $lookup{$refs} . + "and $timestamp \n"; print "REFS $refs: occurrences at " . $lookup{$refs} . " and $ +timestamp \n"; } $lookup{$refs} = $timestamp; } close $fh_out; #close $fh; __DATA__ 2008-Jan-06 00:00:01 UTC (GMT +0000) - Poll: channel = ONE, refs = 595 +166299 2007-Jan-06 00:00:01 UTC (GMT +0000) - Poll: channel = TWO, refs = 595 +159906 2007-Jan-06 00:00:01 UTC (GMT +0000) - Poll: channel = THREE, refs = 6 +59975924 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = ONE, refs = 595 +148941 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = TWO, refs = 595 +131400 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = THREE, refs = 6 +59975924 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = ONE, refs = 595 +159906 2007-Jan-06 01:00:05 UTC (GMT +0000) - Poll: channel = ONE, refs = 595 +166299 2007-Jan-06 01:00:06 UTC (GMT +0000) - Poll: channel = TWO, refs = 595 +131400 2007-Jan-06 01:00:06 UTC (GMT +0000) - Poll: channel = TWO, refs = 659 +975924 2007-Jan-06 01:00:07 UTC (GMT +0000) - Poll: channel = THREE, refs = 5 +95148941
    I'm leaving it to you to re-translate the two timestamps into properly formatted dates.
    Regards,
    svenXY
Re: comb a logfile for time diff and occourance
by hipowls (Curate) on Feb 18, 2008 at 09:54 UTC

    And last time you showed me some code;-)

    Are you just interested in counting the number of times that TWO occurs in each hour? Just an outline.

    my %stats; while ( my $line = <$file> ) { next unless $line =~ /TWO/; my ($time) = $line =~ /^(\d\d\d\d-\w\w\w-\d\d \d\d)/)/; ++$stats{$time}; }

    Update: Put $time in () and changed ++$stats{$date} to ++$stats{$time}. Thanks to CountZero for pointing out the error.

      That won't work: the time difference must be more than one hour.

      Also you use $date as a key to your hash, but you never put anything in it. You probably meant $time.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        Thanks for catching the typo, I've corrected the original post.

        You are right to some extent, I'd missed the bit about identical refs. However the specs aren't clear and it is not obvious what the OP really wants to report.

        A modification of the code to

        while ( my $line = <$file> ) { next unless $line =~ /TWO/; my ($time, $ref) = $line =~ /^(\d\d\d\d-\w\w\w-\d\d \d\d)/).*refs += (\d+)$/; ++$stats{$time}{$ref}; }
        may suffice. It will keep a count of how many times each ref number occurs during each hour. Typically data is reported over fixed intervals of time and this approach certainly has the advantage of simplicity.

      hi hipowls, i pasted your codes into the following
      my $in_file = "$logfile"; open my $in_fh, '<', $in_file or die "Could not open file $in_file: $! ++"; my $out_file = 'output.txt'; open my $out_fh, '>', $out_file or die "Could not open file $out_file: ++ $!"; while ( my $line = <$in_fh> ) { print {$out_fh} $line if $line =~ /production/; } my %stats; while ( my $line = <$file> ) { next unless $line =~ /TWO/; my ($time) = $line =~ /^(\d\d\d\d-\w\w\w-\d\d \d\d)/)/; ++$stats{$time}; } close $in_fh or die "Could not close file $in_file: $!"; close $out_fh or die "Could not close file $out_file: $!";
      but i get error in EnginSite Perl Editor LE
      C:\code>perl -wc a.pl syntax error at a.pl line 37, near "/^(\d\d\d\d-\w\w\w-\d\d \d\d)/)" a.pl had compilation errors.
      Any pointers ?
Re: comb a logfile for time diff and occourance
by bruceb3 (Pilgrim) on Feb 18, 2008 at 10:35 UTC

    To start off, I am not claiming that I understand what is expected in the final report. It looks like my code will produce different output to what others have provided.

    Secondly, I should say that I have probably misunderstood what is it that you are after. I am seem to be out of tune at the moment.

    Third, no attempt has been made to handle timezones.

    Forth, the code will build a hash by ref number of each line in the file, so if the log is large or the available memory of the box is small, there could be problems.

    Cheers.

    #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; use Date::Calc qw( Delta_DHMS ); my %months = ( Jan => 1, Feb => 2, Mar => 3, Apr => 4, May => 5, Jun => 6, Jul => 7, Aug => 8, Sep => 9, Oct => 10, Nov => 11, Dec => 12 ); my %by_ref; my $occured; my @results; push @{ $by_ref{(split)[-1]} }, do { chomp $_; $_ } while (<DATA>); my (%p_date, $p_line, %tmdiff, %date, $r_hash); REF: for my $ref (keys %by_ref) { next unless @{ $by_ref{$ref} } > 1; LINE: for my $line ( @{$by_ref{$ref}} ) { my ($datestr, $time, $chnl) = (split /\s+/, $line)[0,1,9]; next unless $chnl =~ /^TWO/; if (keys %p_date) { $r_hash = \%date; } else { $r_hash = \%p_date; $p_line = $line; } @{$r_hash}{qw/year month day hour min sec/} = split_date($datestr, $time); next unless keys %date; @tmdiff{qw/Dd Dh Dm Ds/} = Delta_DHMS( @p_date{qw/year month day hour min sec/}, @date{qw/year month day hour min sec/} ); if ($tmdiff{Dd} > 0 || $tmdiff{Dh} > 0) { push @results, [ $p_line, $line ]; $occured++; } } } continue { %date = %p_date = (); $p_line = ""; } printf "$occured occurence%s.\n", $occured < 2 ? "" : "s"; print join("\n", @$_),"\n\n" for (@results); sub split_date { my ($datestr, $timestr) = @_; my @fields = split /-/, $datestr; push @fields, split /:/, $timestr; $fields[1] = $months{$fields[1]}; return @fields; } __DATA__ 2008-Jan-06 00:00:01 UTC (GMT +0000) - Poll: channel = ONE, refs = 595 +166299 2007-Jan-06 00:00:01 UTC (GMT +0000) - Poll: channel = TWO, refs = 595 +159906 2007-Jan-06 00:00:01 UTC (GMT +0000) - Poll: channel = THREE, refs = 6 +59975924 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = ONE, refs = 595 +148941 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = TWO, refs = 595 +131400 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = THREE, refs = 6 +59975924 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = ONE, refs = 595 +159906 2007-Jan-06 01:00:05 UTC (GMT +0000) - Poll: channel = ONE, refs = 595 +166299 2007-Jan-06 01:00:06 UTC (GMT +0000) - Poll: channel = TWO, refs = 595 +131400 2007-Jan-06 01:00:06 UTC (GMT +0000) - Poll: channel = TWO, refs = 659 +975924 2007-Jan-06 01:00:07 UTC (GMT +0000) - Poll: channel = THREE, refs = 5 +95148941

    Output;

    1 occurence. 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = TWO, refs = 595 +131400 2007-Jan-06 01:00:06 UTC (GMT +0000) - Poll: channel = TWO, refs = 595 +131400
Re: comb a logfile for time diff and occourance
by CountZero (Bishop) on Feb 18, 2008 at 10:49 UTC
    Somewhat shorter than other suggestions (by using the Date::Time modules):
    use strict; use DateTime::Format::Flexible; use DateTime::Duration; my $one_hour = DateTime::Duration->new(hours => 1); my %loglines; while (<DATA>) { next unless /TWO/; # only handle log-items with for channel TWO my ($time, $ref) = /(.*) UTC.*refs = (\d+)$/; my $dt = DateTime::Format::Flexible->build( $time ); if (defined $loglines{$ref}) { my $difference = $dt->subtract_datetime( $loglines{$ref}->[0] +); if (DateTime::Duration->compare( $difference, $one_hour) == 1 +) { # check if difference more than 1 hour print $loglines{$ref}->[1], $_, "------------------------- +\n"; delete $loglines{$ref}; # reset the item } } else { $loglines{$ref} = [$dt, $_]; # save the item } } __DATA__ 2008-Jan-06 00:00:01 UTC (GMT +0000) - Poll: channel = ONE, refs = 595 +166299 2007-Jan-06 00:00:01 UTC (GMT +0000) - Poll: channel = TWO, refs = 595 +159906 2007-Jan-06 00:00:01 UTC (GMT +0000) - Poll: channel = THREE, refs = 6 +59975924 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = ONE, refs = 595 +148941 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = TWO, refs = 595 +131400 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = THREE, refs = 6 +59975924 2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = ONE, refs = 595 +159906 2007-Jan-06 01:00:05 UTC (GMT +0000) - Poll: channel = ONE, refs = 595 +166299 2007-Jan-06 01:00:06 UTC (GMT +0000) - Poll: channel = TWO, refs = 595 +131400 2007-Jan-06 01:00:06 UTC (GMT +0000) - Poll: channel = TWO, refs = 659 +975924 2007-Jan-06 01:00:07 UTC (GMT +0000) - Poll: channel = THREE, refs = 5 +95148941
    And the result is:
    2007-Jan-06 00:00:04 UTC (GMT +0000) - Poll: channel = TWO, refs = 595 +131400 2007-Jan-06 01:00:06 UTC (GMT +0000) - Poll: channel = TWO, refs = 595 +131400 -------------------------
    As the specification is a bit vague, I have chosen to reset the check once I have found two matching items more than 1 hour apart. You will then need again two new entries in the log more than one hour apart to trigger it.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James