comment on

Why would you print the number of occurrences in the loop, in which case every additional occurrence gets listed with the current count of past occurrences? I think you need to build your data structure completely then iterate over it.

Try this on for size:

#!/usr/bin/perl --
use strict;
use warnings;
use Time::Local;
use POSIX qw( strftime );

my %conf = (
    'input'    => 'input.2008-01-01.log',
    'output'   => 'output.2008-01-01.log',
    'duration' => 3600,
);

my %track;

sub dateconv {
    my ( $date, $time ) = @_;
    my %months = qw( Jan 01 Feb 02 Mar 03 Apr 04 May 05 Jun 06 Jul 07 
+Aug 08 Sep 09 Oct 10 Nov 11 Dec 12 );

    my @parts = reverse split /:/, $time;
    push @parts, reverse split /-/, $date;

    $parts[4] = $months{ $parts[4] } - 1;

    return timelocal( @parts );
}

open ( my $in,  '<', $conf{ 'input' } ) or die 'Cannot open input file
+ '  . $conf{ 'input' } . ": $!\n";


while ( <$in> ) {
    chomp;
    # Example input:
    # 2008-Jan-01 00:00:00 UTC (GMT +0000) - Toll: channel = seven, re
+f = xxx.xxxxxx.xxx.xxxxx.xxxxxxx.xxxxxxxxxxxxxxxxxxxxx, tids = 123456
+789
    if (
        /(\d{4}-\w{3}-\d{2})\s
        (\d{2}:\d{2}:\d{2})\s
        \w+\s
        \(GMT\s([\+\-]\d{4})\)\s
        -\sToll:\schannel\s=\s(\w+),\s
        ref\s=\s\S+,\s
        tids\s=\s(\d+)
        /x
    ) {
        my ( $date, $time, $offset ) = ( $1, $2, $3 );
        my ( $channel, $id ) = ( $4, $5 );
        my $e_time = dateconv( $date, $time );

        if ( defined $track{ $channel }{ $id } ) {
            if ( $e_time - $track{ $channel }{ $id }{ 'time' } > $conf
+{ 'duration' } ) {
                $track{ $channel }{ $id }{ 'occurrences' }++;
            }
        } else {
            $track{ $channel }{ $id }{ 'time' } = $e_time;
            $track{ $channel }{ $id }{ 'occurrences' } = 1;
        }
    } else {
        print "line does not match!\n";
    }
}
close $in;

open ( my $out, '>', $conf{ 'output' } ) or die 'Cannot open output fi
+le ' . $conf{ 'input' } . ": $!\n";
print $out <<_HEADER;
TIDS               time                   Occurrence
====================================================
_HEADER

foreach my $channel ( sort keys %track ) {
    foreach my $id ( sort keys %{ $track{$channel} } ) {
        if ( $track{$channel}{$id}{ 'occurrences' } > 1 ) {
            my $date_time = POSIX::strftime( '%Y-%b-%d %H:%M:%S', ( lo
+caltime( $track{$channel}{$id}{ 'time' } ) ) );
            print $out "$id\t\t$date_time\t" . $track{$channel}{$id}{ 
+'occurrences' } . "\n";
        }
    }
}
close $out;


__END__
[download]

I've made a few slight modifications which don't necessarily reflect your errors, but which reflect how I'd attack the problem:

I'm using a regex for the log line, which should be a bit more flexible if the log format should ever happen to change.
I'm extracting the date and time separately and passing both to dateconv
I'm using reverse() on the portions of the date and time rather than passing a slice to timelocal

According to your code, it seems that you are interested only in two records next to one another. I draw this conclusion because if you have three records on the same channel and ID, you'll not be able to tell if, for example, the third one and the first one are more than an hour apart. Is that really what you want? The only scenario that immediately explains to me the session code you're using is a periodic task completion, like travelling a circular route and crossing a start/finish line or passing a token back and forth on a network.

I might just need more info to understand this, but there seems to be issues with the logging method. There's no indication in the information you present as to what's a start record and what's a stop record, yet you consider any pair of matching IDs with no likewise matching IDs between them a "record". Yet if you have more than two, you'll be considering the first and second as a session record, the second and the third as a session record, and the third and the fourth... Unless you're absolutely sure you'll never have more than two lines with the same ID (like if it's a unique session ID), then you're counting more sessions than you have. OTOH, if you're guaranteed to never have more than two lines with the same ID, then why do you need a count of the occurrences for that ID? Are you timing network connections, lap times around a track, stops at physical tool booths on a highway, or what?

Given the troublesome issues I can't reconcile with your logging input and your code, I coded the above to match the first occurrence of a particular ID's time against any and all lines for that ID later. This gives a count of how many times an ID was logged more than an hour from the initial log line. It should be trivial to change that behavior back to the behavior your code represents.

In reply to Re: Comparing Dates and Reoccurance - Part II by mr_mischief
in thread Comparing Dates and Reoccurance - Part II by tuakilan

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.