Re: Efficient use of memory

You are overcomplicating things a lot. Some advice:

Use meaningful names for your vars and subroutines. %HoA, %hash_keys, $pointer, parse_hash or create_hash are very bad names because they don't say anything about the data inside or what they do.

Don't use global variables (%hash_keys, %AoH) to pass or get data from subroutines.

Try to simplify your structures, deep object trees lead to difficult to understand code. Also thing about the proper type to use in every case. For instance, it makes no sense to use a hash to store a list of ordered values.

And finally, modules are good only when they simplify your problem, to calculate the mean and deviation you don't really need a module!


use strict;
use warnings;
no warnings 'uninitialized';

my (%count, %sum, %sum2, %min, %max, @key);

while(<>) {
  my (@val) = split /,/;
  if ($val[0]=~/^Year/) {
    @key = @val;
  }
  else {
    @key == @val
      or die "number of values and keys don't match"
    for my $i (0..$#key) {
      my $key = $key[$i];
      next if $key =~/Year|Time/;
      my $val = $val[$i];
      $count{$key} ++;
      $sum{$key} += $val;
      $sum2{$key} += $val*$val;
      $min{$key} = $val
        if (not defined $min{$key} or $val < $min{$key});
      $max{$key} = $val
        if (not defined $max{$key} or $val > $max{$key});
    }
  }
}

for my $key (sort keys %count) {
  my $count = $count{$key}
  my $sum = $sum{$key};
  my $mean = $sum/$count;
  my $deviation = sqrt($sum2{$key}/$count - $mean*$mean);
  printf("key: %s, mean: %f, deviation: %f, min: %f, max: %f",
         $key, $mean, $deviation,
         $min{$key}, $max{$key});
}
[download]

oh, and consider using Text::xSV or Text::CSV_XS for parsing CSV files.

Comment on Re: Efficient use of memory Download Code

Replies are listed 'Best First'.
Re^2: Efficient use of memory by ivancho (Hermit) on Jun 04, 2005 at 18:15 UTC
ummm... so he shouldn't use modules to calculate various statistics of data vectors, because that's simple, but he should use a module to split on a comma for him.. that doesn't make sense to me.. Sorry, I don't mean to be bickering - in my opinion modules are useful whenever they make even a simple but repetitive task nice and short - more so, because they reduce the chance of an error...and occasionally I do get tired of writing the same code to get mean, variance, min, etc...	[reply]
Re^3: Efficient use of memory by salva (Canon) on Jun 04, 2005 at 18:44 UTC
but he should use a module to split on a comma for him.. Parsing CSV files is not so simple as splitting on a comma, they can contain quoted data with commas inside or multiline records.	[reply]
Re^4: Efficient use of memory by ivancho (Hermit) on Jun 04, 2005 at 19:42 UTC
I couldn't agree more.. Honestly, I find both Text:: modules you suggested pretty nifty, and I would gladly use them - it will certainly prevent bizarre errors if the data specs change in the future. my point was that it doesn't make sense to use various modules to take care of technicalities like parsing, but then refuse to use modules in the actually functional parts of the code... Using a simple module for a simple task is good, IMO - we know exactly what is happenning behind the curtains, but we're spared from having to write the details.. lazy..	[reply]
Re^5: Efficient use of memory by salva (Canon) on Jun 04, 2005 at 20:11 UTC