Try this on for size:
#!/usr/bin/perl -- use strict; use warnings; use Time::Local; use POSIX qw( strftime ); my %conf = ( 'input' => 'input.2008-01-01.log', 'output' => 'output.2008-01-01.log', 'duration' => 3600, ); my %track; sub dateconv { my ( $date, $time ) = @_; my %months = qw( Jan 01 Feb 02 Mar 03 Apr 04 May 05 Jun 06 Jul 07 +Aug 08 Sep 09 Oct 10 Nov 11 Dec 12 ); my @parts = reverse split /:/, $time; push @parts, reverse split /-/, $date; $parts[4] = $months{ $parts[4] } - 1; return timelocal( @parts ); } open ( my $in, '<', $conf{ 'input' } ) or die 'Cannot open input file + ' . $conf{ 'input' } . ": $!\n"; while ( <$in> ) { chomp; # Example input: # 2008-Jan-01 00:00:00 UTC (GMT +0000) - Toll: channel = seven, re +f = xxx.xxxxxx.xxx.xxxxx.xxxxxxx.xxxxxxxxxxxxxxxxxxxxx, tids = 123456 +789 if ( /(\d{4}-\w{3}-\d{2})\s (\d{2}:\d{2}:\d{2})\s \w+\s \(GMT\s([\+\-]\d{4})\)\s -\sToll:\schannel\s=\s(\w+),\s ref\s=\s\S+,\s tids\s=\s(\d+) /x ) { my ( $date, $time, $offset ) = ( $1, $2, $3 ); my ( $channel, $id ) = ( $4, $5 ); my $e_time = dateconv( $date, $time ); if ( defined $track{ $channel }{ $id } ) { if ( $e_time - $track{ $channel }{ $id }{ 'time' } > $conf +{ 'duration' } ) { $track{ $channel }{ $id }{ 'occurrences' }++; } } else { $track{ $channel }{ $id }{ 'time' } = $e_time; $track{ $channel }{ $id }{ 'occurrences' } = 1; } } else { print "line does not match!\n"; } } close $in; open ( my $out, '>', $conf{ 'output' } ) or die 'Cannot open output fi +le ' . $conf{ 'input' } . ": $!\n"; print $out <<_HEADER; TIDS time Occurrence ==================================================== _HEADER foreach my $channel ( sort keys %track ) { foreach my $id ( sort keys %{ $track{$channel} } ) { if ( $track{$channel}{$id}{ 'occurrences' } > 1 ) { my $date_time = POSIX::strftime( '%Y-%b-%d %H:%M:%S', ( lo +caltime( $track{$channel}{$id}{ 'time' } ) ) ); print $out "$id\t\t$date_time\t" . $track{$channel}{$id}{ +'occurrences' } . "\n"; } } } close $out; __END__
I've made a few slight modifications which don't necessarily reflect your errors, but which reflect how I'd attack the problem:
According to your code, it seems that you are interested only in two records next to one another. I draw this conclusion because if you have three records on the same channel and ID, you'll not be able to tell if, for example, the third one and the first one are more than an hour apart. Is that really what you want? The only scenario that immediately explains to me the session code you're using is a periodic task completion, like travelling a circular route and crossing a start/finish line or passing a token back and forth on a network.
I might just need more info to understand this, but there seems to be issues with the logging method. There's no indication in the information you present as to what's a start record and what's a stop record, yet you consider any pair of matching IDs with no likewise matching IDs between them a "record". Yet if you have more than two, you'll be considering the first and second as a session record, the second and the third as a session record, and the third and the fourth... Unless you're absolutely sure you'll never have more than two lines with the same ID (like if it's a unique session ID), then you're counting more sessions than you have. OTOH, if you're guaranteed to never have more than two lines with the same ID, then why do you need a count of the occurrences for that ID? Are you timing network connections, lap times around a track, stops at physical tool booths on a highway, or what?
Given the troublesome issues I can't reconcile with your logging input and your code, I coded the above to match the first occurrence of a particular ID's time against any and all lines for that ID later. This gives a count of how many times an ID was logged more than an hour from the initial log line. It should be trivial to change that behavior back to the behavior your code represents.
In reply to Re: Comparing Dates and Reoccurance - Part II
by mr_mischief
in thread Comparing Dates and Reoccurance - Part II
by tuakilan
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |