in reply to Re^2: Parse CSV lines and process for each second
in thread Parse CSV lines and process for each second

You are trying to hard to use pushes into arrays when what you really want to do is set up what I've called in the past a "keyed abacus." Use your date/time information as a key to a hash of hashes such that each element in the hash is a hash consisting of {min, max} values. Iterate through your data, create a key into your has with the date and time concatenated, if that key already exists update the min and max elements of the hash with the new values.

Other than writing the code out for you long hand I'm not sure how much better I can explain it.

#!/usr/bin/perl -w use strict; my $table={}; while(my $line=<DATA>){ chop $line; my ($name,$date,$time,$value)=split(/[\s]+/,$line); my $key = $date . "-" . $time; if ( ! defined($table->{$key}) { $table->{$key}={ name => '', min => 99999999, max => 0 }; } $table->{$key}->{min} = $value if $value < $table->{$key}->{min} $table->{$key}->{max} = $value if $value > $table -> {$key}->{max +} } foreach my $key (sort keys %$table){ my ($date,$time) = split("-",$key); printf "Date: %s Time: %s max = %d min = %d\n",$date,$time, $table->{$key}->{max}, $table->{$key}->{min}; } exit(0); __END__ fred 9/1/2011 15:00:00 50 mary 9/1/2011 15:00:00 0 john 9/1/2011 15:00:00 16

You should see an output something like:

Date: 9/1/2011 Max: 50 Min: 0

By the way...this looks like the sort of assignment I would give to my students for homework when I used to teach Perl.


Peter L. Berghold -- Unix Professional
Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg

Replies are listed 'Best First'.
Re^4: Parse CSV lines and process for each second
by capriguy84 (Novice) on Sep 07, 2011 at 21:16 UTC

    Thanks for writing up the long script and giving a new idea. I kind of understand now that you are using date-time as key and checking min & max and only then feeding in to the hash.

    If I choose this route, I have to work on other columns(5 & 6) to calculate weighted mean, get first & last value of the each second. Along with that count the max occurrences of column 6. I was able to deduce all above operations using arrays and that is the reason why I was reluctant to change the scheme.

    Btw, this is not a college assignment, I am working on some finance data to plot graphs.

    Anyone else have other ideas?

      I think the idea you have been given is just fine for your original problem. But if you want to do other things as well, as you now indicate, you should still stick with the hash. The point is what the hash contains. Each hash value can be a pointer to an array. This is frequently referred to as a "Hash of Arrays" or HoA. That way, you can keep as many statistics as you want in the array, with one array for each hash key. There's plenty of documentation on this data structure.

      Regards,

      John Davies