Re: Tabulating Data Across Multiple Large Files

I'm still unclear on some things.

Is there actually anything special about the timestep field?
Does it actually make any difference which rows are in which files?

In my solution following, I am assuming the answer to both questions is "no".

  my @fields = qw( 1 Case Iter Fusion Type Tanks AFVs ADAs IFVs UAVS
   Unknown Total Latency Decoys FalseNeg FalsePos );
  my @key_fields = @fields[0,1,3,4];
  my @data_fields = @fields[2,5..15];

  my %n; # key="@key_field_vals"; val = count
  my %r; # key="@key_field_vals"; val = hashref: key=field, val=sum (a
+nd later, average)

  while (<>)  # read all files, in sequential order (not in parallel)
  {
    chomp;
    my %rec;
    @rec{@fields} = split /,/;
    my $key = join ",", @rec{@key_fields};
    $n{$key}++;
    for my $f ( @data_fields )
    {
      $r{$key}{$f} += $rec{$f};
    }
  }

  # now each $r{$key}{$f} is the sum of that column for that key
  # convert them to averages.
  for my $key ( keys %r )
  {
    for my $f ( sort keys %{$r{$key}} )
    {
      $r{$key}{$f} /= $n{$key};
    }
  }

  # now you can convert the results to normal-looking records:
  my @averages; # one per unique key-vector value.
  for ( keys %r )
  {
    my %rec;
    @rec{ @key_fields } = split /,/;
    @rec{ keys %{$r{$_}} } = values %{$r{$_}};
    push @averages, \%rec;
  }
  # now @averages is an array of records that look exactly like
  # the data rows you read in, except that the data column values
  # are averages, and the key field value vectors are unique.
[download]

(Caution: Untested.)

jdporter
The 6th Rule of Perl Club is -- There is no Rule #6.

Comment on Re: Tabulating Data Across Multiple Large Files Download Code