tuakilan has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

Need to read a log like such, record 1 and 4 has the same id number and their time difference is more than 1 hour.

Input.log

2007-Nov-07 00:00:00 - id = 000000001 2007-Nov-07 00:30:01 - id = 000000002 2007-Nov-07 00:40:00 - id = 000000003 2007-Nov-07 01:20:01 - id = 000000001

Output Log, desired result shall be

ID LAST TIME Occurance ===================================================== 000000001 2007-Nov-07 01:20:01 1

Anyone can assist to write an algo to perform the comparison ? Thanks in advance.

Replies are listed 'Best First'.
Re: Comparing Dates and Reoccurance
by NetWallah (Canon) on Mar 12, 2008 at 05:12 UTC
    This should do the trick. Formatting, printing headers, redoing the I/O from the file is left as an excercise.
    use strict; use Time::Local; my %track; while (<DATA>){ my ($date,$ignoreIDLiteral,$id) = split / - | = /; chomp $id; my $time = dateconv($date); my $prevtime = $track{$id}{TIME}; $track{$id}{TIME}=$time; $track{$id}{DATE}=$date; $track{$id}{COUNT}++; print "$id\t$date\t$track{$id}{COUNT}\n" if $prevtime and $time - $prevtime > 3600; } sub dateconv{ my $d = shift; my %month = qw[jan 1 feb 2 mar 3 apr 4 may 5 jun 6 jul 7 aug 8 sep 9 oct 10 nov 11 dec 12]; my @p = $d=~/(\d+)-(\w+)-(\d+)\s(\d+):(\d+):(\d+)/; $p[1]=$month{ lc $p[1] } - 1; return timelocal(@p[5,4,3,2,1,0]); #timelocal($sec,$min,$hour,$mday,$mon,$year); } __DATA__ 2007-Nov-07 00:00:00 - id = 000000001 2007-Nov-07 00:30:01 - id = 000000002 2007-Nov-07 00:40:00 - id = 000000003 2007-Nov-07 01:20:01 - id = 000000001
    prints:
    000000001	2007-Nov-07 01:20:01	2
    

         "As you get older three things happen. The first is your memory goes, and I can't remember the other two... " - Sir Norman Wisdom

      Hi Netwallah,

      I added to your code with the following when i run the code, it does not however, produce the desired output.

      #!/usr/local/bin/perl -w use strict; use warnings; use Time::Local; my $infile = 'input.2008-01-01.log'; my $outfile = 'output.2008-01-01.log'; my($fh_out, $fh); open($fh_out, '>', $outfile) or die "Could not open outfile: $!"; open($fh, '<', $infile) or die "Could not open logfile: $!"; my %track; while (<$fh>){ my ($date,$ignoreIDLiteral,$id) = split / - | = /; chomp $id; my $time = dateconv($date); my $prevtime = $track{$id}{TIME}; $track{$id}{TIME}=$time; $track{$id}{DATE}=$date; $track{$id}{COUNT}++; print "$id\t$date\t$track{$id}{COUNT}\n" if $prevtime and $time - $prevtime > 3600; } sub dateconv{ my $d = shift; my %month = qw[jan 1 feb 2 mar 3 apr 4 may 5 jun 6 jul 7 aug 8 sep 9 oct 10 nov 11 dec 12]; my @p = $d=~/(\d+)-(\w+)-(\d+)\s(\d+):(\d+):(\d+)/; $p[1]=$month{ lc $p[1] } - 1; return timelocal(@p[5,4,3,2,1,0]); #timelocal($sec,$min,$hour,$mday,$mon,$year); } close $fh_out; close $fh;

        You are opening and closing $fh_out, but you are not WRITING to it.

        Do you see anything on STDOUT ?

        If the file format is as you said in the initial post, this has been tested and it works. However, it is fragile, and even the slightest difference in format will throw it off.

        I would suggest learning how to debug the program, stepping through each statement, and checking the values.

             "As you get older three things happen. The first is your memory goes, and I can't remember the other two... " - Sir Norman Wisdom

Re: Comparing Dates and Reoccurance
by Narveson (Chaplain) on Mar 12, 2008 at 05:35 UTC

    The algorithm won't be hard if you can say what you're aiming for. Doesn't have to be a formal statement of requirements, an example is fine as long as it covers the obvious questions.

    • Do you want IDs 000000002 and 000000003 in the output?
    • What is the Occurance and why is it 1?

    I'll go ahead and assume the input log is sorted by timestamp and in fixed-width format.

    # see manpage for unpack my $TEMPLATE = 'A20 @28A9'; # read the input log into a hash my %last_time; while (<DATA>) { my ($timestamp, $id) = unpack $TEMPLATE; $last_time{$id} = $timestamp; } # print the output log # I have omitted the header for my $id (sort keys %last_time) { print "$id\t$last_time{$id}\n"; }
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Comparing Dates and Reoccurance
by apl (Monsignor) on Mar 12, 2008 at 09:51 UTC
    Time::Simple lets you subtract two times, returning the difference in seconds. You can also search at specifying Time.
Re: Comparing Dates and Reoccurance
by wade (Pilgrim) on Mar 12, 2008 at 23:20 UTC
    If I understand the problem correctly, I'd make an array of IDs where each element is a hash containing a DateTime (a module you can get from CPAN) and an occurrence count. Then, when you iterate through the input file, you can check the date from the input with the date in the array (at the appropriate ID). The code might look something like this (I haven't tried any of this -- it's off the top of my pointy little head):
    use strict; use warnings; use DateTime; my @earliestEvent; open LOGFILE, "<", $filename || die "..."; while (my $input = <LOGFILE>) { my ($dateString, $id) = split / - id = /, $input; my $date = DateTime->new( # use one of the constructors to # fill the date ); if (!exists($earliestEvent[$id])) { $earliestEvent[$id] = {}; $earliestEvent[$id]->{"count"} = 1; $earliestEvent[$id]->{"date"} = $date; } else { my $hourBoundary = $earliestEvent[$id]->{"date"}; $hourBoundary->add(hour=>1); if ($date > $hourBoundary) { print OUTFILE "$id\t" . $earliestEvent[$id]->{"date"}->datetime . "\t" . $earliestEvent[$id]->{"count"} . "\n"; $earliestEvent[$id]->{"count"} = 1; $earliestEvent[$id]->{"date"} = $date; } else { ++{$earliestEvent[$id]->{"count"}}; } } } # then, of course, you'll need to print the remaining ones
    Note: DateTime has some idiosyncrasies. You'll probably, for example, want to use the Floating time zone. Does that work for you?