tuakilan has asked for the wisdom of the Perl Monks concerning the following question:
This is to follow up from on http://www.perlmonks.com/?node_id=673882 where i posted questions on how to deal with dates and re occurrences
for a newbie like me i am trying to finish an assignment and i am pulling my hairs :( and now the complexity has gone up
The tasks :
1. read a raw ASCII log file which was collected by a toll collecting machine.
2. from the log file, using "tids" and "channel" as the key, locate records that are longer than 3600 seconds.
NEW
3. a seperate ASCII file, call tids-list.txt, shall contain the list of 'tids' which are tids values used in task no 2
4. record down how many times such incident happened and identify it as 'occurrences'
5. output the result in the order as shown in 'report-2007-01-01.txt'.
in SQL statement, it look similar to this
select * from
where channel = seven
and tids = ( 123456789, 987654321, ... )
and time > 3600 seconds
commit;
Exact raw ASCII logfile from toll collecting machine, tollog-2007-jan-01.txt
2008-Jan-01 00:00:00 UTC (GMT +0000) - Toll: channel = seven, ref = xx +x.xxxxxx.xxx.xxxxx.xxxxxxx.xxxxxxxxxxxxxxxxxxxxx, tids = 123456789 2008-Jan-01 00:10:00 UTC (GMT +0000) - Toll: channel = six, ref = xxx. +xxxxxx.xxx.xxxxx.xxxxxxx.xxxxxxxxxxxxxxxxxxxxx, tids = 987654321 2008-Jan-01 00:20:00 UTC (GMT +0000) - Toll: channel = three, ref = xx +x.xxxxxx.xxx.xxxxx.xxxxxxx.xxxxxxxxxxxxxxxxxxxxx, tids = 223344221 2008-Jan-01 00:30:00 UTC (GMT +0000) - Toll: channel = four, ref = xxx +.xxxxxx.xxx.xxxxx.xxxxxxx.xxxxxxxxxxxxxxxxxxxxx, tids = 998829992 2008-Jan-01 00:40:00 UTC (GMT +0000) - Toll: channel = three, ref = xx +x.xxxxxx.xxx.xxxxx.xxxxxxx.xxxxxxxxxxxxxxxxxxxxx, tids = 938874724 2008-Jan-01 00:50:00 UTC (GMT +0000) - Toll: channel = two, ref = xxx. +xxxxxx.xxx.xxxxx.xxxxxxx.xxxxxxxxxxxxxxxxxxxxx, tids = 229928828 2008-Jan-01 01:00:00 UTC (GMT +0000) - Toll: channel = five, ref = xxx +.xxxxxx.xxx.xxxxx.xxxxxxx.xxxxxxxxxxxxxxxxxxxxx, tids = 998822992 2008-Jan-01 01:10:00 UTC (GMT +0000) - Toll: channel = seven, ref = xx +x.xxxxxx.xxx.xxxxx.xxxxxxx.xxxxxxxxxxxxxxxxxxxxx, tids = 123456789
As you can see from the above, record 1 and 8 are the output which are desired as these 2 records has the same channel name and tids number.
Desired report file : report-2007-01-01.txt
TIDS time Occurance ==================================================== 123456789 2008-Jan-01 01:10:00 2
Sample 'tids-list.txt'
123456789 987654321 112233445 888899889
So far what i did was the following but it wrote a zero byte size file :(
#!/usr/local/bin/perl -w use strict; use warnings; use Time::Local; my $infile = 'input.2008-01-01.log'; my $outfile = 'output.2008-01-01.log'; my($fh_out, $fh); open($fh_out, '>', $outfile) or die "Could not open outfile: $!"; open($fh, '<', $infile) or die "Could not open logfile: $!"; my %track; while (<$fh>){ my ($date,$ignoreIDLiteral,$id) = split / - | = /; chomp $id; my $time = dateconv($date); my $prevtime = $track{$id}{TIME}; $track{$id}{TIME}=$time; $track{$id}{DATE}=$date; $track{$id}{COUNT}++; print "$id\t$date\t$track{$id}{COUNT}\n" if $prevtime and $time - $prevtime > 3600; } sub dateconv{ my $d = shift; my %month = qw[jan 1 feb 2 mar 3 apr 4 may 5 jun 6 jul 7 aug 8 sep 9 oct 10 nov 11 dec 12]; my @p = $d=~/(\d+)-(\w+)-(\d+)\s(\d+):(\d+):(\d+)/; $p[1]=$month{ lc $p[1] } - 1; return timelocal(@p[5,4,3,2,1,0]); #timelocal($sec,$min,$hour,$mday,$mon,$year); } close $fh_out; close $fh;
I think I messed up with the regex of the incoming logfile. Anyone can correct me where i did wrong ?
how to add the 'tids-list.txt' into the search routine ?
Thank you very much !!!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Comparing Dates and Reoccurance - Part III
by ww (Archbishop) on Mar 22, 2008 at 22:53 UTC | |
|
Re: Comparing Dates and Reoccurance - Part III
by stiller (Friar) on Mar 22, 2008 at 18:13 UTC | |
|
Re: Comparing Dates and Reoccurance - Part III
by NetWallah (Canon) on Mar 22, 2008 at 22:26 UTC |