Angharad has asked for the wisdom of the Perl Monks concerning the following question:

Hi there
I have a text file that looks like the following
0 48.23 17.90 48.23 0 49.58 17.90 49.58 0 59.62 52.04 65.80 62.20 56.02 68.82 35.37 37.87 36.52 27.33 50.73 31.85
etc ...
What I would like to do is take these values and calculate the average. The calculation is obviously trivial but I'm struggling somewhat with extracting the values due to the format (take into account that the actual files I am working on have many more than three columns per row and that these files all have a different number of columns (so one file may have 50 columns per row, another 35 columns per row and so on).
This is what I have come up with so far
#!/usr/bin/perl use strict; my @line; # shall we get the file then? my $inputfile = $ARGV[0]; # guess we need to open it too open(INPUT, "$inputfile") || die "Error: Can't open $inputfile for reading: $!\n"; # lets place contents of file into array my @filecontents = <INPUT>; # we now want a for loop to access the array for(my $i = 0; $i < @filecontents; $i++) { # place each line in file # into another array @line = split(/\,/, $filecontents[$i]); }
My problem is I'm not quite sure how to go from here in order to extract one value at a time from each row so I can add all values together when calculating my average.
Any help/suggestions much appreciated.

Replies are listed 'Best First'.
Re: extracting and using values from a matrix file
by friedo (Prior) on Aug 31, 2006 at 16:36 UTC
    You're assigning each line of the file to @line, when you split, clobbering whatever was there from the previous iteration of the loop. For a matrix the best structure is probably an array of arrays. See perldsc for information on how to go about it. I'd modify your loop to something like this:

    my @matrix; foreach my $line(@filecontents) { chomp $line; push @matrix, [ split /\s+/, $line ]; }

    That will give you a structure that looks like this. (Per Data::Dumper):

    $VAR1 = [ [ '0', '48.23', '17.90' ], [ '48.23', '0', '49.58' ], [ '17.90', '49.58', '0' ], [ '59.62', '52.04', '65.80' ], [ '62.20', '56.02', '68.82' ], [ '35.37', '37.87', '36.52' ], [ '27.33', '50.73', '31.85' ] ];

    Update: Oops, should have paid closer attention. Looks like you want to split on whitespace, not commas. I fixed the split and data dump.

    There are also lots of CPAN modules for doing matrix math which you may want to look at. Check out Math::MatrixReal.

Re: extracting and using values from a matrix file
by shmem (Chancellor) on Aug 31, 2006 at 20:18 UTC
    This post says essentially the same as the previous, just ++$verbose ;-)

    Your code with one comment:

    for(my $i = 0; $i < @filecontents; $i++) { # place each line in file # into another array # @line = split(/\,/, $filecontents[$i]); # XXX ^------ do you really mean comma?

    In the following, I'll asume the fields are separated by blanks. No need to backslash a comma, BTW.

    Ok, what shall we do with @line? assigning to it on each iteration overwrites the previous content.

    We need a multidimensional array - a matrix. References (see perlreftut) are handy for that.

    # [] constructs an anonymous array, so we use that. # several ways to do it - choose any (not all :-) my $anon_array = []; # gimme an anonymous array and # assign it to $anon_array @$anon_array = @line; # assign to the anonymous array push(@matrix, $anon_array); # shortcut: $anon_array = [ @line ]; push(@matrix, $anon_array); # compacting further: split returns a list, use that # to populate an anonymous array on the fly, push that # (the SCALAR value which is an array reference) onto # the array @matrix push(@matrix, [split(/\s+/, $filecontents[$i])]); }

    But why do we have all more than one time in memory? (one array holding the lines, another holding the broken-up structure) - back to the file reading.

    We can save many a malloc() and brk() and typing just saying

    my @matrix; while (<INPUT>) { chomp; # you forgot that - remove trailing newline char push(@matrix, [split]); }

    because while(<INPUT>) assigns to $_, and split by default breaks $_ with the pattern /[ \t]+/, i.e. on one or more whitespace characters.

    If you use the <> magic, perl opens each file in @ARGV for you in order, and reads those; using it, your code boils down to

    #!/usr/bin/perl use strict; my @matrix; while (<>) { chomp; # you forgot that - remove trailing newline char push(@matrix, [split]); }

    If you want to read just one file, you can truncate @ARGV assigning to the array length (well, not quite: the arrays last index):

    $#ARGV = 0; # @ARGV now holds only one element (the first at index 0)

    How do you get your data back? The following prints the value in the 4th column of the 7th line (again, see perlreftut):

    print $matrix[7]->[4],"\n";

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
    
Re: extracting and using values from a matrix file
by graff (Chancellor) on Sep 01, 2006 at 05:33 UTC
    What I would like to do is take these values and calculate the average.

    Do you mean that you just want a single numeric value as output, which is merely the sum of all the input numbers, divided by the number of input values? If so, you don't need any sort of complicated data structure inside your perl script:

    #!/usr/bin/perl use strict; use warnings; my $sum = my $count = 0; while (<>) { for my $n ( split ) { $sum += $n; $count++; } } if ( $count ) { printf( "Average over %d values: %7.3\n", $count, $sum/$count ); } else { warn "No numeric data\n"; }
    Note that the default operation for split is to return the list of whitespace separated tokens from $_; since newline character(s) at the end of each line of input are whitespace, split removes them.

    In case your matrix file happens to contain any non-numeric tokens (e.g. words), those will be treated as zero when adding values to $sum. (update: but they will be counted in $count, increasing the divisor for the average)

    The "diamond" operator in the while loop condition allows you to run the script in either of two ways (supposing the script were stored as "getavg"):

    # pipe data from some other process: some_matrix_program | getavg # or read data from some file: getavg matrix.file
    In the latter case, if you gave two or more matrix file names on the one command line, you'd get a single average over all files combined.

    If your needs are more complicated than getting a single global average value, you'll need to explain them better. There's likely to be an easy solution.

Re: extracting and using values from a matrix file
by dokkeldepper (Friar) on Sep 01, 2006 at 10:57 UTC
    use Math::MatrixReal; #update: Math::MatrixReal->new_from_string(<<'MATRIX'); [ 0 48.23 17.90] [ 48.23 0 49.58] [ 17.90 49.58 0] [ 59.62 52.04 65.80] [ 62.20 56.02 68.82] [ 35.37 37.87 36.52] [ 27.33 50.73 31.85] MATRIX
    Then all your stuff reduces to finde the appropriate indices to iterate over. For the more exotic itarations
    use Algorithm::Combinatorics;
    For the column sums remember that with the matrix X and the rows(X)-vector I of one X'1 gives the sums for every column.