Re^2: averages from multiple files

When posting response code, it's important to demonstrate good form to those seeking wisdom. In the above code, you have a few elements which, while functional, are probably bad habits.

You use a two-argument open. Three argument makes intent more obvious and keeps your script from misbehaving if your file name contains special characters. This is particularly pertinent in a web-context or when piping. See open.
You do not test your opens for success. If you don't do this, you may get weird failures far from the source of the problem since the open fails silently.
You declare all of your lexical variables at the head of the file, rather than keeping them as tightly scoped as possible. This can result in data leaking between between parts of your script and spooky action at a distance errors. You have essentially created global variables, and hamstrung some of the great power that strict offers you.
When split is invoked with one argument, it is applied to the special variable $_, so you can omit that from your split arguments. If you are implicitly populating $_, which you are, you should leverage that power consistently. Additionally, although the OP used /\s/ in that split, the default split argument of /\s+/ is probably more appropriate - manually edited tab-delimited files often end up no longer tab-delimited. Therefore, that line should probably just be my ($key, $val) = split;.
Since you are using Indirect Filehandles, there is no need to explicitly close the file. The file will be automatically closed once the filehandle goes out of scope - one of the great qualities of indirect filehandles in the first place.

All of the above points are addressed in the following rewrite of your code:

use strict;
use warnings;

my %data;

for my $file ('file1.txt', 'file2.txt', 'file3.txt',
           'file4.txt' , 'file5.txt') {
    open(my $handle, '<', $file) or die "Open fail on $file: $!\n";
    while (<$handle>) {
        chomp;
        my ($key, $val) = split;
        $data{$key}{'count'}++;
        $data{$key}{'sum'} += $val;
    }
}

open(my $handle, '>', 'average.txt') or die "Open fail on average.txt:
+ $!\n";
for my $key (sort { $a <=> $b } keys %data) {
    $data{$key} = $data{$key}{'sum'} / $data{$key}{'count'};
    print $handle "$key\t$data{$key}\n";
}
[download]

Other changes I might include would be not quoting key arguments for hashes and the use of qw for quoting the file list (see qw/STRING/).

Comment on Re^2: averages from multiple files Select or Download Code

Replies are listed 'Best First'.
Re^3: averages from multiple files by Taylorswift13 (Novice) on Nov 25, 2011 at 17:49 UTC
Hi all thank you for thank taking the time to reply to me in depth i understand i am a new user and should do a bit more reading kennethk your code is working well thank you for your description	[reply]
Re^3: averages from multiple files by TJPride (Pilgrim) on Nov 25, 2011 at 19:41 UTC
1. Since the file names in this case are being hard-coded, a three-argument open is unnecessary. 2. A better point, but also unnecessary for this example. 3. If you say so. I personally find it rather messy having "my" spread all through the code at random points. 4. Good point on the argument. Regarding what he's splitting on, that's up to him, not me. I'm not going to bother trying to anticipate all the things that could possibly go wrong with his input data. 5. Yes, but if this is part of a larger program, there may be other things running for some time after this finishes, and the last part of the file buffer may or may not actually write to the file until the close is declared. I prefer to close at the first opportunity just to build good habits. I've actually run into problems not closing things immediately in the past.	[reply]
Re^4: averages from multiple files by Marshall (Canon) on Nov 27, 2011 at 03:58 UTC
I like the way that kennethk did this quite a bit. From reading your post, I'm not sure that the points about "my" and file handles is completely clear. You clearly see that a re-open on the same file handle closes the previous file. In the first foreach loop() for reading, my $handle is already closed before he gets to the point of opening that handle for write! This is a good thing. In your code that is not true. To do the same thing, you would need an explicit close() after the reading loop. I would have gone further and used a name like "my $infile" in the first loop and a different name, like "my $outfile" for the output file. Whether or not to close a file when the program is going to end shortly (very short time - some milliseconds) is a minor quibble. One main reason to do this is that file handles are limited resources and closing a file handle frees resources - there will even be a limit to the maximum number of files that the entire system can have open at one time - albeit large as that limit is. However short of an OS fault (crash), leaving a file handle open for a very short time causes no harm. If the program is going to run for a significant time after you are finished with a file, I would close it. Making these judgements is part of the art of programming. The rule that I use that the level of the program that opens a file is responsible for closing it. Open it in a loop, close it in that loop. Open it in a sub, close in that sub. Early in my career, I saw one program run for 5 days and then fail because an open for the output file failed! So handling these situations can definitely have impacts! I am an absolute fan of the use of "my" to scope variables to the smallest area possible. kennethk's code just has %data and the last use of $my handle at the file level scope. You have lots, like $key is "re-used". A big advantage of using "my" is that you can "cut-n-paste" code into new programs without having to worry about "spooky unforeseen action at a distance".	[reply]