roadtest has asked for the wisdom of the Perl Monks concerning the following question:

Here is what I ended up doing. I like CountZero solution below. That is much convenient. Cheers,
use warnings; use strict; use Time::Local; my ($ReadOp, $WriteOp); while (<DATA>) { next if ($. == 1); chomp; my @output = split/,\s?/; if ( $output[2] eq "W" ) { $WriteOp -> {$output[1]}{$output[4]} += $output[3]; } elsif ( $output[2] eq "R") { $ReadOp -> {$output[1]}{$output[4]} += $output[3]; } } print "=======\n"; print "WriteOperation\n"; print "=======\n"; for my $ip(keys %{$WriteOp}) { my $files= ${$WriteOp}{$ip}; for my $file(keys %{$files}) { print "$ip => $file => ${$WriteOp}{$ip}{$file}\n"; } } print "\n\n\n"; print "=======\n"; print "ReadOperation \n"; print "=======\n"; for my $ip(keys %{$ReadOp}) { my $files= ${$ReadOp}{$ip}; for my $file(keys %{$files}) { print "$ip => $file => ${$ReadOp}{$ip}{$file}\n"; } } __DATA__ Sun Aug 21 12:08:21 2011,172.22.32.28, W, 4964,/export/file1.log Sun Aug 21 12:08:21 2011,172.22.32.28, W, 4964,/export/file2.log Sun Aug 21 12:08:21 2011,172.22.32.28, W, 4963,/export/file2.log Sun Aug 21 12:08:21 2011,172.22.112.141, W, 9292,/export/home/another. +file Sun Aug 21 12:08:21 2011,172.22.32.28, W, 4964,/export/file2.log Sun Aug 21 12:08:21 2011,172.22.32.28, W, 2493,/export/file1.log Sun Aug 21 12:08:21 2011,172.22.32.28, W, 2355,/export/another.log Sun Aug 21 12:08:21 2011,172.22.32.28, R, 25,/export/another.log Sun Aug 21 12:08:21 2011,172.20.220.38, W, 3699,/export/file2.log Sun Aug 21 12:08:21 2011,172.20.220.146, W, 1996,<?> Sun Aug 21 12:08:21 2011,172.22.32.28, W, 2776,/export/file1.log Sun Aug 21 12:08:21 2011,172.22.32.28, R, 26,/export/file1.log

==============================

Hello gurus,

I have some raw data as following. How can I aggregate them to a report format, so that we can see total data transferred from which IP to which file. I consider using reference to parse the original data. What is the data structure I should use? Is there better way to achieve it?

Thanks in advance,

====expected report format=======

IP Write/Read TotalTransferData FileName 172.22.32.28 W 6736 /export/file1.log 172.22.220.38 W 6737 /export/file2.log

===raw data================================

# ##DATE IP Write/Read TransferData FileName # Sun Aug 21 12:08:21 2011,172.22.32.28, W, 4964,/export/file1.log Sun Aug 21 12:08:21 2011,172.22.32.28, W, 4964,/export/file2.log Sun Aug 21 12:08:21 2011,172.22.32.28, W, 4963,/export/file2.log Sun Aug 21 12:08:21 2011,172.22.112.141, W, 9292,1102,/export/home/ano +ther.file Sun Aug 21 12:08:21 2011,172.22.32.28, W, 4964,/export/file2.log Sun Aug 21 12:08:21 2011,172.22.32.28, W, 2493,/export/file1.log Sun Aug 21 12:08:21 2011,172.22.32.28, W, 2355,/export/another.log Sun Aug 21 12:08:21 2011,172.20.220.38, W, 3699,/export/file2.log Sun Aug 21 12:08:21 2011,172.20.220.146, W, 1996,<?> Sun Aug 21 12:08:21 2011,172.22.32.28, W, 2776,/export/file1.log

Replies are listed 'Best First'.
Re: aggregate data
by NetWallah (Canon) on Aug 25, 2011 at 04:50 UTC
    This looks like a job for a database.

    I'd recommend using something simple, like Sqlite, unless you are already familiar with, and have easy access to something bigger/better.

    Just load the data into a table, and let SQL do the aggregating for you.

                "XML is like violence: if it doesn't solve your problem, use more."

Re: aggregate data
by duyet (Friar) on Aug 25, 2011 at 08:15 UTC

    You can use DB to process the data but if you want to do it in Perl, below is some pseudo-code for processing the log data (assuming raw data is saved in a file):

    open INF infile for reading or die cannot read file define data hash while read line from file chomp line next if line start with '#' or empty line split line on ',' and save in @tmp $data->{$tmp[1]}{$tmp[4]}{$tmp[2]} += $tmp[3]; end while print "IP\tWrite/Read\tTotal\tFileName\n\n" foreach ip ( sort keys data ) foreach file ( keys data->{ip} ) foreach rw ( keys data->{ip}{file} total = data->{ip}{file}{rw} print "$ip\t$rw\ttotal\tfile\n" end foreach rw end foreach file end foreach ip
    Assuming the 2 numbers on 'another.file' is a typo, otherwise you need to do some more processing with it in the while loop
Re: aggregate data
by CountZero (Bishop) on Aug 25, 2011 at 19:45 UTC
    Easy.

    use Modern::Perl; use Data::Dump qw/dump/; my %database; while (<DATA>) { next if /^#/; chomp; my (undef, $ip, $rw, $amount, $file) = split /,\s?/; $database{$ip}{$rw}{$file} += $amount; } say dump(\%database); __DATA__ # ##DATE IP Write/Read TransferData FileName # Sun Aug 21 12:08:21 2011,172.22.32.28, W, 4964,/export/file1.log Sun Aug 21 12:08:21 2011,172.22.32.28, W, 4964,/export/file2.log Sun Aug 21 12:08:21 2011,172.22.32.28, W, 4963,/export/file2.log Sun Aug 21 12:08:21 2011,172.22.112.141, W, 9292,/export/home/another. +file Sun Aug 21 12:08:21 2011,172.22.32.28, W, 4964,/export/file2.log Sun Aug 21 12:08:21 2011,172.22.32.28, W, 2493,/export/file1.log Sun Aug 21 12:08:21 2011,172.22.32.28, W, 2355,/export/another.log Sun Aug 21 12:08:21 2011,172.20.220.38, W, 3699,/export/file2.log Sun Aug 21 12:08:21 2011,172.20.220.146, W, 1996,<?> Sun Aug 21 12:08:21 2011,172.22.32.28, W, 2776,/export/file1.log
    output:
    { "172.20.220.146" => { W => { "<?>" => 1996 } }, "172.20.220.38" => { W => { "/export/file2.log" => 3699 } }, "172.22.112.141" => { W => { "/export/home/another.file" => 9292 } } +, "172.22.32.28" => { W => { "/export/another.log" => 2355, "/export/file1.log" => 10233, "/export/file2.log" => 14891, }, }, }
    Caution: this script does not perform any checking whether the data is in a valid format! For instance, your 4th data-line is in a "wrong" format which I assume is just a typo and I "corrected" it by hand.

    Printing the results in a pretty format is left as an exercise for the reader.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Thanks, I posted my solution. Your solution is neat. Should always remember usage of dump.

      cheers,

Re: aggregate data
by Anonymous Monk on Aug 25, 2011 at 04:06 UTC
    Sun Aug 21 12:08:21 2011,172.22.112.141, W, 9292,1102,/export/home/another.file

    Which of these 2 numbers represent the write (right :-) amount?