Done, I used the manual calculation of the stddev on slices of the original array containing the interleaved data.
It takes more than calculating the stddev of the whole set that you wrote, but it is still 9 seconds against the original 60 seconds (meaning 3 seconds of actual calculation against 54).
Thank you.
I post the whole code, even if I am already satisfied. If you have time to read it, maybe I can learn something more.
The script requires an input on the command line. In that input file are listed the actual starting data files, each one with a weight.
The header of one input data file is simply copied to the output, the following values (4 per line) split, multiplied by the weight factor and interleaved in the data array (reading is slow, so I tried to put some work in this step).
In the last part I recreate the input file, with 4 results per line after the headers. In between there is the code I asked you about.
#!/usr/bin/perl -w
# (C) Olaf Marzocchi, 2009
# Calculates the std dev of different tallies, the size must be the sa
+me
# It assumes a standard order of the values: after LOOKUP_TABLE always
+ the data
require 5.004;
use strict;
use List::Util qw[ sum ];
# use Data::Dump qw[ pp ];
use POSIX qw(strtod setlocale LC_NUMERIC);
setlocale(LC_NUMERIC, "en_US");
$\ = "\n";
$, = "\t";
if (@ARGV < 1) {
print "Usage: $0 config_file [output_file]";
print " Config file must contain on each line the relative path
+to one ";
print " .vtk file and the weight of each one, separated with tab
+ulation.";
exit;
}
print "Processing $ARGV[0]...";
# Formats the filenames in order to always get the extension
my %inputs;
my $output_name = "";
my $weights = 0;
open(INPUTS, '<', $ARGV[0]) or die "Error opening $ARGV[0]: $!";
while(<INPUTS>) {
next if /^#/;
s|\.vtk\t|\t|;
/^([^\t]+)\t+([0-9eE+-.,]+)/;
$weights += strtod($2);
$inputs{$1 . '.vtk'} = strtod($2);
}
close(INPUTS);
$output_name = $ARGV[0] . ".vtk";
$output_name =~ s/\.txt//g;
$output_name = $ARGV[1] if (@ARGV > 1);
# Sum of the weights normalised to 1
foreach(keys(%inputs)) {
$inputs{$_} /= $weights;
next unless (-e $inputs{$_});
print "$inputs{$_} is not available.\n";
exit;
}
# The header of the first file is also copied to the output.
my @input_data;
my @output_data;
my $header_lines = 0;
my $elements;
my $filename = each (%inputs);
open(INPUT, '<', $filename) or die "Error opening $filename: $!";
print "Using $filename for headers.\n";
while (<INPUT>) {
chomp;
push(@output_data, $_);
last if (/lookup_table/i);
$elements = $1 if (/point_data +(\d+)/i);
$header_lines++;
}
close(INPUT);
# Saves the array of actual data of the different input files into an
+array with
# values already interleaved.
my $files = keys(%inputs);
my $current_file = 0;
foreach $filename (sort(keys(%inputs))) {
print $filename, $inputs{$filename};
open(INPUT, '<', $filename) or die "Error opening $filename: $!";
my $i = 0;
while (<INPUT>) {
last if ($i == $header_lines);
$i++;
}
my $j = 0;
while (<INPUT>) {
chomp;
my @records = split;
foreach (@records) {
# multiplies by the weight and puts in the array
$input_data[$files * $j + $current_file] = $_ * $inputs{$f
+ilename};
$j++;
}
}
close(INPUT);
$current_file++;
}
my @ReturnData;
for (my $i = 0; $i < @input_data; $i += $files) {
push @ReturnData, sprintf '%.5E', coffOfVar( @input_data[$i .. $i
++ $files - 1] );
}
# file generation
for (my $i = 0; $i < @ReturnData; $i += 4) {
if (@ReturnData - $i < 4) {
push(@output_data, join(' ', @ReturnData[$i .. $#ReturnData]))
+;
last;
}
push(@output_data, join(' ', @ReturnData[$i .. $i + 3]));
}
$, = " ";
$\ = "\n";
print $output_name;
open(OUTPUT, '>', $output_name) or die "Error opening output: $!";
foreach (@output_data) {
print OUTPUT $_;
}
close(OUTPUT);
exit(0);
sub coffOfVar {
my $mean = sum( @_ ) / $files;
return 0 unless $mean;
my @dif2s = map{ ( $_ - $mean )**2 } @_;
my $stddev = sum( @dif2s ) / $files;
return $stddev**0.5 / $mean
}
|