Well then, are you going to pay me? As i am unemployed, i can really use some USD.
. . .
Anyway ...
- First add the parse_date() sub from my earlier post. Rename it to something else, say "time_components", to avoid confusing it w/
Time::ParseDate::parsedate().
- Set $period_size and $unit
as appropriate.
- When the $time is retrieved in the while
loop, check if the minutes are mod ($period / SEC_PER_MINUTE).
- If not, move to the next line.
- Otherwise, let it be processed as usual.
Giving you something like...
while ( <FH> ) {
...
next if $time !~ m/ \[ (.+?) \] /x;
next # 2d last item is minute
unless time_components($time)->[-2] % ($period / SEC_PER_MINUTE);
...
}
| [reply] [d/l] [select] |
Hi Parv,
Wish I could pay you! Sorry about your employment status! :-( Here is the Perl script that I created. This works for me and produces the output that I targetted for.
To start processing records by the quarter of the hour, for example, 14:00:00, 14:15,..., I checked for (($minute%15) !=0), right at the beginning of the subroutine count_recs.
#!/usr/bin/perl -w
use strict;
use Getopt::Long;
use Time::Local;
use Time::localtime;
my $file="xaa";
my %increment = ();
my $interval = 900; #An interval of 15 minutes
my %MonthToNumber = ();
@MonthToNumber{qw (Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
+} = map { sprintf "%02d", $_; } (0..11);
count_recs();
sub count_recs {
my ($Day, $Month, $Year, $Hour, $Minute, $Second);
my ($datetime, $get_post);
my $dmytime;
my $dateproc;
my @dates = ();
my @dates_mod = ();
open (INFILE, "<$file") || die "Cannot read from $file";
while (<INFILE>) {
($datetime,$get_post) = (split / /) [3,6];
$datetime =~ s/\[//;
($Day,$Month,$Year,$Hour,$Minute,$Second)= $datetime =~m#^(\d\d)
+/(\w\w\w)/(\d\d\d\d):(\d\d):(\d\d):(\d\d)#;
next
if ($get_post =~ /\.js$/ || $get_post =~ /\.gif$/ || $get_pos
+t =~ /\.css$/ || ($Minute%15) != 00);
push (@dates, $datetime);
} #outer while
# Calculate the time in epoch seconds in the subroutine convert_epoch.
+
foreach $dmytime (@dates){
my $calc_seconds= &convert_epoch($dmytime);
push (@dates_mod, $calc_seconds);
}
foreach $dateproc (@dates_mod) {
$increment{$dateproc}++;
}
# Print the headings in the output file
open (OUTFILE, ">>access_log.out") || die "Cannot write to $file";
print OUTFILE " Date Range of Time Visual Pages per seco
+nd(VPPS)\n";
print OUTFILE " Avg Max Min\n";
print OUTFILE "------------ ----------------- ------------\n";
&calculate_time();
print OUTFILE "\n\n\n";
$interval = 3600;
&calculate_time();
close(OUTFILE);
close(INFILE);
}
sub convert_epoch {
my $localdate = shift;
my ($epoch_Day,$epoch_Month,$epoch_Year,$epoch_Hour,$epoch_Minute,$
+epoch_Second)= $localdate =~m#^(\d\d)/(\w\w\w)/(\d\d\d\d):(\d\d):(\d\
+d):(\d\d)#;
my $epoch_seconds = timelocal($epoch_Second, $epoch_Minute, $epoch_
+Hour, $epoch_Day, $MonthToNumber{$epoch_Month}, $epoch_Year-1900);
return $epoch_seconds;
}
sub calculate_time {
my $first = 0;
my $start_time;
my $end_time;
my $time;
my ($max, $min, $avg, $count);
my $partial=0;
my $col_ind;
my $processed;
my $tm;
foreach $processed ( sort keys %increment ) {
if ($first == 0 ) {
$start_time = $processed;
$end_time = $start_time + $interval;
$time = $start_time;
$min = $increment{$processed};
$max = 0;
$avg = 0;
$first++;
}
if ($processed < $end_time) {
if ($increment{$processed} < $min) {
$min = $increment{$processed};
}
if ($increment{$processed} > $max) {
$max = $increment{$processed};
}
$avg+= $increment{$processed};
$count++;
$partial = 1;
}else {
$avg=$avg/$count;
my $tm = localtime($end_time);
my $start = localtime($start_time);
if ($interval == 900) {
$col_ind = 'P';
}else {
$col_ind = 'H';
}
printf OUTFILE "%01s %02d/%02d/%04d %02d:%02d:00-%02d:%02d:0
+0 %03.2f %03d %03d\n", $col_ind, ($tm->mon)+1,$tm->mday, $tm->year+19
+00,$start->hour, $start->min, $tm->hour,$tm->min, $avg, $max, $min;
$min = $increment{$processed};
$max = $increment{$processed};
$avg = $increment{$processed};
$count = 1;
$start_time = $processed;
$time = $start_time;
$end_time = $start_time + $interval;
$partial = 0;
}
}
if ( $partial == 1 ) {
$avg = $avg/$count;
$tm = localtime($end_time);
my $start = localtime($start_time);
printf OUTFILE "%01s %02d/%02d/%04d %02d:%02d:00-%02d:%02d:00 %0
+3.2f %03d %03d\n", $col_ind, ($tm->mon)+1,$tm->mday, $tm->year+1900,$
+start->hour, $start->min, $tm->hour,$tm->min, $avg, $max, $min;
}
}
Pl. suggest any improvements you can think of. I know that I used the MonthToNumber (convert "Apr" to "04" and so on) as a Global. Thanks to you for that input.
Also, I used the hash %increment as a global. May be I could pass it as an argument to the subroutine calculate_time?
I needed to process two intervals of time, 15min and 1 hour. That's why I called the subroutine twice, with the interval used globally. Any thoughts?
Performance wise, I got a 100MB file processed in about 1min 5 secs.
My next step is to be able to process multiple input files. Currently, I hardcoded an input file. But, I would like to pass multiple log files as input and process them all.
-Andy | [reply] [d/l] |
Forgot to add the output! Here it is.
Date Range of Time Visual Pages per second(VPPS)
Avg Max Min
------------ ----------------- ------------
P 04/25/2003 14:00:00-14:15:00 1.28 006 001
P 04/25/2003 14:15:00-14:30:00 4.50 010 001
P 04/25/2003 14:30:00-14:45:00 5.19 014 001
P 04/25/2003 14:45:00-15:00:00 5.17 013 001
P 04/25/2003 15:00:00-15:15:00 5.60 013 001
P 04/25/2003 15:15:00-15:30:00 5.80 014 001
P 04/25/2003 15:30:00-15:45:00 5.90 014 001
P 04/25/2003 15:45:00-16:00:00 5.60 011 001
P 04/25/2003 16:00:00-16:15:00 5.08 012 001
P 04/25/2003 16:15:00-16:30:00 5.38 011 001
P 04/25/2003 16:30:00-16:45:00 5.48 014 001
P 04/25/2003 16:45:00-17:00:00 5.84 013 001
P 04/25/2003 17:00:00-17:15:00 5.85 015 001
P 04/25/2003 17:15:00-17:30:00 5.65 012 002
P 04/25/2003 17:30:00-17:45:00 5.95 022 001
P 04/25/2003 17:45:00-18:00:00 6.17 016 001
P 04/25/2003 18:00:00-18:15:00 6.05 012 001
P 04/25/2003 18:15:00-18:30:00 5.44 015 001
P 04/25/2003 18:30:00-18:45:00 5.67 015 001
P 04/25/2003 18:45:00-19:00:00 6.12 014 001
P 04/25/2003 19:00:00-19:15:00 6.16 014 001
P 04/25/2003 19:15:00-19:30:00 5.33 016 001
P 04/25/2003 19:30:00-19:45:00 4.83 012 001
H 04/25/2003 14:00:00-15:00:00 4.50 014 001
H 04/25/2003 15:00:00-16:00:00 5.73 014 001
H 04/25/2003 16:00:00-17:00:00 5.43 014 001
H 04/25/2003 17:00:00-18:00:00 5.89 022 001
H 04/25/2003 18:00:00-19:00:00 5.84 015 001
H 04/25/2003 19:00:00-20:00:00 5.46 016 001
~
| [reply] [d/l] |
Hi Parv,
How are you doing? Long time since I emailed you again. I have another question. In the output below, if there is data missing, I should still report the timestamp range and report Avg, Max, Min as 0s (zeroes). Basically what I mean is that I could have my input data something like:
06/14/2003
06/14/2003 15:59:12
06/14/2003 17:00:02
06/14/2003 17:00:08
....
Currently, my output shows that gap in the data.
Date Range of Time Visual Pages per second(VPPS)
Avg Max Min
------------ ----------------- ------------
P 04/25/2003 14:00:00-14:15:00 1.28 006 001
P 04/25/2003 14:15:00-14:30:00 4.50 010 001
P 04/25/2003 14:30:00-14:45:00 5.19 014 001
P 04/25/2003 14:45:00-15:00:00 5.17 013 001
P 04/25/2003 15:00:00-15:15:00 5.60 013 001
P 04/25/2003 15:15:00-15:30:00 5.80 014 001
P 04/25/2003 15:30:00-15:45:00 5.90 014 001
P 04/25/2003 15:45:00-16:00:00 5.60 011 001
P 04/25/2003 17:00:00-17:15:00 5.85 015 001
P 04/25/2003 17:15:00-17:30:00 5.65 012 002
P 04/25/2003 17:30:00-17:45:00 5.95 022 001
P 04/25/2003 17:45:00-18:00:00 6.17 016 001
P 04/25/2003 18:00:00-18:15:00 6.05 012 001
P 04/25/2003 18:15:00-18:30:00 5.44 015 001
P 04/25/2003 18:30:00-18:45:00 5.67 015 001
P 04/25/2003 18:45:00-19:00:00 6.12 014 001
P 04/25/2003 19:00:00-19:15:00 6.16 014 001
P 04/25/2003 19:15:00-19:30:00 5.33 016 001
P 04/25/2003 19:30:00-19:45:00 4.83 012 001
H 04/25/2003 14:00:00-15:00:00 4.50 014 001
H 04/25/2003 15:00:00-16:00:00 5.73 014 001
H 04/25/2003 17:00:00-18:00:00 5.89 022 001
H 04/25/2003 18:00:00-19:00:00 5.84 015 001
H 04/25/2003 19:00:00-20:00:00 5.46 016 001
But I still want to show the missing timestamps as:
P 04/25/2003 16:00:00-16:15:00 0.00 000 000
P 04/25/2003 16:15:00-16:30:00 0.00 000 000
P 04/25/2003 16:30:00-16:45:00 0.00 000 000
P 04/25/2003 16:45:00-17:00:00 0.00 000 000
H 04/25/2003 16:00:00-17:00:00 0.00 000 000
Shall appreciate help on this.
Thanks
Andy | [reply] [d/l] [select] |