in reply to Re: Poor performances with threads
in thread Poor performances with threads

In the first version of the script I used hashes of arrays to iterate through the different input files and data, then I thought that I didn't need the hashes at all and I switched to a single 1D arra containing interleaved data from the different inputs. 20% faster.

I really don't know why I kept the Statistics module... for such a simple computation. I will now apply your suggestions and I will study them a bit...

Thank you very much.

Replies are listed 'Best First'.
Re^3: Poor performances with threads
by olafmar (Novice) on Oct 26, 2009 at 12:18 UTC
    Done, I used the manual calculation of the stddev on slices of the original array containing the interleaved data. It takes more than calculating the stddev of the whole set that you wrote, but it is still 9 seconds against the original 60 seconds (meaning 3 seconds of actual calculation against 54).

    Thank you.

    I post the whole code, even if I am already satisfied. If you have time to read it, maybe I can learn something more.
    The script requires an input on the command line. In that input file are listed the actual starting data files, each one with a weight.
    The header of one input data file is simply copied to the output, the following values (4 per line) split, multiplied by the weight factor and interleaved in the data array (reading is slow, so I tried to put some work in this step).
    In the last part I recreate the input file, with 4 results per line after the headers. In between there is the code I asked you about.

    #!/usr/bin/perl -w # (C) Olaf Marzocchi, 2009 # Calculates the std dev of different tallies, the size must be the sa +me # It assumes a standard order of the values: after LOOKUP_TABLE always + the data require 5.004; use strict; use List::Util qw[ sum ]; # use Data::Dump qw[ pp ]; use POSIX qw(strtod setlocale LC_NUMERIC); setlocale(LC_NUMERIC, "en_US"); $\ = "\n"; $, = "\t"; if (@ARGV < 1) { print "Usage: $0 config_file [output_file]"; print " Config file must contain on each line the relative path +to one "; print " .vtk file and the weight of each one, separated with tab +ulation."; exit; } print "Processing $ARGV[0]..."; # Formats the filenames in order to always get the extension my %inputs; my $output_name = ""; my $weights = 0; open(INPUTS, '<', $ARGV[0]) or die "Error opening $ARGV[0]: $!"; while(<INPUTS>) { next if /^#/; s|\.vtk\t|\t|; /^([^\t]+)\t+([0-9eE+-.,]+)/; $weights += strtod($2); $inputs{$1 . '.vtk'} = strtod($2); } close(INPUTS); $output_name = $ARGV[0] . ".vtk"; $output_name =~ s/\.txt//g; $output_name = $ARGV[1] if (@ARGV > 1); # Sum of the weights normalised to 1 foreach(keys(%inputs)) { $inputs{$_} /= $weights; next unless (-e $inputs{$_}); print "$inputs{$_} is not available.\n"; exit; } # The header of the first file is also copied to the output. my @input_data; my @output_data; my $header_lines = 0; my $elements; my $filename = each (%inputs); open(INPUT, '<', $filename) or die "Error opening $filename: $!"; print "Using $filename for headers.\n"; while (<INPUT>) { chomp; push(@output_data, $_); last if (/lookup_table/i); $elements = $1 if (/point_data +(\d+)/i); $header_lines++; } close(INPUT); # Saves the array of actual data of the different input files into an +array with # values already interleaved. my $files = keys(%inputs); my $current_file = 0; foreach $filename (sort(keys(%inputs))) { print $filename, $inputs{$filename}; open(INPUT, '<', $filename) or die "Error opening $filename: $!"; my $i = 0; while (<INPUT>) { last if ($i == $header_lines); $i++; } my $j = 0; while (<INPUT>) { chomp; my @records = split; foreach (@records) { # multiplies by the weight and puts in the array $input_data[$files * $j + $current_file] = $_ * $inputs{$f +ilename}; $j++; } } close(INPUT); $current_file++; } my @ReturnData; for (my $i = 0; $i < @input_data; $i += $files) { push @ReturnData, sprintf '%.5E', coffOfVar( @input_data[$i .. $i ++ $files - 1] ); } # file generation for (my $i = 0; $i < @ReturnData; $i += 4) { if (@ReturnData - $i < 4) { push(@output_data, join(' ', @ReturnData[$i .. $#ReturnData])) +; last; } push(@output_data, join(' ', @ReturnData[$i .. $i + 3])); } $, = " "; $\ = "\n"; print $output_name; open(OUTPUT, '>', $output_name) or die "Error opening output: $!"; foreach (@output_data) { print OUTPUT $_; } close(OUTPUT); exit(0); sub coffOfVar { my $mean = sum( @_ ) / $files; return 0 unless $mean; my @dif2s = map{ ( $_ - $mean )**2 } @_; my $stddev = sum( @dif2s ) / $files; return $stddev**0.5 / $mean }