dukea2006 has asked for the wisdom of the Perl Monks concerning the following question:

Monks,
I'm hoping you all can point me in the right direction here.
I'm trying to put together a script that will provide me with some performance metrics from some App Server log files. Among other things, I need to be able to calculate the following:

I have App Server request logs that are in the following tab delimited format and contain approximately 10k requests per log file.
Note, these aren't all the columns in the log files but I'm sure you get the gist of what the data looks like:

RequestID userLoginID server port pageTitle pageURL dateAdded requestTime(ms)
123 456 app1 80 foo1 foo2 2011-12-14 13:01:00.127 250
124 457 app2 80 foo1 foo2 2011-12-14 13:05:00.128 247
125 458 app2 80 foo1 foo2 2011-12-14 13:10:00.105 247
126 459 app3 80 foo1 foo2 2011-12-14 13:11:01.125 435

Given that data above the results I want to calculate are:
13:00 -13:05 = 2 requests, 248.5 Avg Request Time
13:05- 13:10 = 2 requests, 341 Avg Request Time

Any suggestions as to the best way to tackle this? I have some code written to calculate some of the other metrics that I need (below) but I haven't written anything for the 5 min calculations since I'm at a loss as to where I should start.
Any suggestions would be greatly appreciated.

#!/usr/bin/perl use warnings; use strict; # open and read file into the array my $file = shift @ARGV; open (FILE1, "<", $file) or die "Can't open '$file': $!"; my @data = <FILE1>; close (FILE1); # sort array by dateAdded column my @sortedDateAdded = sort {(split "\t",$a)[6] cmp (split "\t",$b)[6]} + @data; #################################### # Init Response Time Variables # Actual my $totalReqs=0; my $lt600ms=0; my $gt600ms=0; my $lt1sec=0; my $bt1and5sec=0; my $bt5and10sec=0; my $gt10sec=0; # Percentage my $lt600pct=0; my $gt600pct=0; my $lt1pct=0; my $bt1and5pct=0; my $bt5and10pct=0; my $gt10pct=0; # Hash Init my %uniqueNames; my %uniquePageURL; my $worstQuery; my $bestQuery; ################################### print "\n"; print"Starting Analysis ... \n"; print "\n"; # Use a FOREACH loop to read through the data in the array foreach my $reqMetrics (@sortedDateAdded) { my($userLoginPageRequestID,$userLoginID,$server,$port,$pageTitle,$page +URL,$dateAdded,$requestTime,$userID,$ipAddress,$browser,$browserVersi +on,$platform,$isloggedIn,$cfid,$sessionCanceled,$sessionTimeOut,$manu +alLogout,$name,$city,$state,$zip,$bhTimeStamp) = split("\t", $reqMetr +ics); # print "$requestTime \n"; #quick debug line - runtime # Calculate Response Times if ($requestTime != 0) { $totalReqs++; } if ($requestTime < 600) { $lt600ms++; $lt600pct=($lt600ms/$totalReqs)*100; } if ($requestTime > 600) { $gt600ms++; $gt600pct=($gt600ms/$totalReqs)*100; } if ($requestTime <= 1000) { $lt1sec++; $lt1pct=($lt1sec/$totalReqs)*100; } if ($requestTime > 1000 && $requestTime <= 5000) { $bt1and5sec++; $bt1and5pct=($bt1and5sec/$totalReqs)*100; } if ($requestTime > 5000 && $requestTime <= 10000) { $bt5and10sec++; $bt5and10pct=($bt5and10sec/$totalReqs)*100; } if ($requestTime > 10000) { $gt10sec++; $gt10pct=($gt10sec/$totalReqs)*100; } # Calculate unique values undef $uniqueNames{$name}; undef $uniquePageURL{$pageURL}; } print "\n"; print "Total Requests Received: $totalReqs\n"; print "Total Unique Users: ",scalar keys %uniqueNames,"\n"; print "Total Unique Page Views: ",scalar keys %uniquePageURL,"\n"; print "\n"; print "REQUEST TIME (ACTUAL) (Percentage)\n"; print "==============================================\n"; print "Requsts < 600ms $lt600ms $lt600pct\n"; print "Requsts > 600ms $gt600ms $gt600pct\n"; print "==============================================\n"; print "Requsts < 1sec $lt1sec $lt1pct\n"; print "Requsts ~ 1-5sec $bt1and5sec $bt1and5pct\n"; print "Requsts ~ 5-10sec $bt5and10sec $bt5and10pct\n"; print "Requsts > 10sec $gt10sec $gt10pct\n";

Replies are listed 'Best First'.
Re: How to calculate values based on date / time stamp?
by NetWallah (Canon) on Jan 29, 2012 at 20:33 UTC
    If you want to graph the results, and/or store them, you can avoid some wheel re-invention, and use tools specifically designed to gather periodic, increasing data, and graphing it. The basic tool is rrdtool.

    Once you understand how that works, you may want to implement the a package that makes it easy to configure and display the info via http. Take a look at cacti for this purpose. It may appear a little intimidating at first look, but once implemented, it is beautiful (unfortunately, it is done in PHP, not perl, but it is still very good).

                "Battle not with trolls, lest ye become a troll; and if you gaze into the Internet, the Internet gazes also into you."
            -Friedrich Nietzsche: A Dynamic Translation

Re: How to calculate values based on date / time stamp?
by JavaFan (Canon) on Jan 29, 2012 at 19:59 UTC
    Any suggestions as to the best way to tackle this? I have some code written to calculate some of the other metrics that I need (below) but I haven't written anything for the 5 min calculations since I'm at a loss as to where I should start.
    So am I! Looking at the table, I'd see 1 request in the period 13:00 - 13:05, 1 request in the period 13:05 - 13:10, and 2 requests in the period 13:10 - 13:15. I've no idea which calculation you use to get a 2-2 split.
Re: How to calculate values based on date / time stamp?
by InfiniteSilence (Curate) on Jan 30, 2012 at 01:47 UTC

    Couple of pieces of advice:

    • Stop parsing delimited strings by hand. You can use Text::CSV
    • even for something with a different separator as the perldoc says, "The module accepts either strings or files as input and can utilize any user-specified characters as delimiters, separators, and escapes so it is perhaps better called ASV (anything separated values) rather than just CSV.'
    • Try perldoc -q switch instead of those if statements.
    • For your question, provided you have an ordered set of times, I would quantize them to multiples of five and then aggregate them in a hash. For instance:
      linux~> perl -e 'my @quant = (); my @times = qw|12 14 21 27 30|; sub r +ound{while (@_[0]%5 != 0){@_[0]++}; return @_[0]}; for(@times){print +round($_), qq|\n|};'

    Celebrate Intellectual Diversity