averages from multiple files

Taylorswift13 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: averages from multiple files by ww (Archbishop) on Nov 25, 2011 at 15:15 UTC
There are many ways to determine if your code is "correct" but you'll need to use at least two approaches if you're going to roll your own tests for a project of this character. For a general test of your syntax, `use strict`, `use warnings` (or, with 5.10 and above, `use Modern::Perl` and a command line like `perl -c progname.pl`. Lo and behold, Perl and the pragmas will tell you. Testing the logic is also fairly easy (at a very rudimentary level): Run your script (using a small subset of each of your data files); look at the results to see if they match what you expect. Rinse, repeat. Then modify one or more of the data files subset so that it no longer matches your model (format). Does the script still work? If so, you're done; if not, you have some error-catching-code still to write. Of course, with anything at all complex, you'll be better served by learning about testing, a topic discussed here rather frequently.	[reply] [d/l] [select]
Re^2: averages from multiple files by Taylorswift13 (Novice) on Nov 25, 2011 at 15:33 UTC
Hi thank you for your reply. I modified the code and applied it `#!/usr/bin/perl use Modern::Perl; open FILE1, "<", "/Users/ts/file1; open FILE2, "<", "/Users/ts/file2"; open FILE3, "<", "/Users/ts/file3"; for $file1 { chomp; @file1 = ($index, $value) = spilt /\s/; } for $file2 { chomp; @file2 = ($index, $value) = spilt /\s/; } for $file3 { chomp; @file3 = ($index, $value) = spilt /\s/; } open (OUTFILE, ">", "/Users/ts/average.txt") or die(Output file error" +); print avg(@file1, @file2, @file3);` [download] the output dosen't like the $file and @file comments i think i am close but not sure how to proceed	[reply] [d/l]
Re^3: averages from multiple files by ww (Archbishop) on Nov 25, 2011 at 21:04 UTC
Did you read the perldoc (at your command prompt, "`perldoc -f avg`") for the function `avg` to learn how to use it correctly? No? You didn't bother.... or couldn't read it? Well, if the latter, that's because there is no such function, nor is there a (CPAN) module of that name to be "`use`"ed (which you didn't do, in any case). As a wise one once said, 'You can't just make s#t up and expect the computer to understand.' And surely your attempt to run that code produced more than some (unspecified -- bad; direct, verbatim quotes -- good!) hints that it (sic) "dosen't like the $file and @file comments." ... perhaps something like this: String found where operator expected at eraseme.pl line 6, near "open +FILE3, "" (Might be a runaway multi-line "" string starting on line 5) (Missing semicolon on previous line?) String found where operator expected at eraseme.pl line 24, near "open + (OUTFILE, "" (Might be a runaway multi-line "" string starting on line 6) (Missing semicolon on previous line?) String found where operator expected at eraseme.pl line 24, near "txt" +) or die(Output file error"" syntax error at eraseme.pl line 6, near "open FILE3, "" Global symbol "@file1" requires explicit package name at eraseme.pl li +ne 26. Global symbol "@file2" requires explicit package name at eraseme.pl li +ne 26. Global symbol "@file3" requires explicit package name at eraseme.pl li +ne 26. eraseme.pl had compilation errors. [download] And you'll notice the test gave up before it even got to the nonsense on line 26 and without mentioning a problem with the non-existent "`spilt`. So here are a few more pointers toward code "correctness": Learn how to read error and warning messages; that is, to understand them. Make sure you actually know how the functions you use actually work. To reach that point, take some babysteps first: in this case, you might want to work out some code to calculate as average, given a set of three $vars instantiated with decimal values. Test with `per -c scriptname.pl` and when that comes up clean, see if what you've written actually works correctly. Then write a script or two opening files and reading from them to scalars, arrays and/or hashes; next re-read the doc for `spilt` (or `split`, as the case may be). Now, combine them, and see if you did so correctly (ie, have error- and warning-free, working code) Learn the basics of the debugger. That's a tool that will help you a lot. But also read a few posts here about using `print` statements as a debugging tool. Some will view these comments as "harsh." Perhaps, but programming is a harsh art* requiring the discipline to learn your tools... and the willingness to learn how to learn. Unlike the proverbial 'horseshoes and handgrenades, "close" is NOT good enough." If you can avoid reading a nonexistant intent to offend into this post, you'll discover motifs and methods that will ease your progress toward mastery.	[reply] [d/l] [select]
Re: averages from multiple files by kennethk (Abbot) on Nov 25, 2011 at 15:44 UTC
Based upon the highly-problematic code you have posted here and other nodes you have written to date, I think you are trying run before you can walk. I would suggest getting a book to help you get on your feet - perhaps reading the free Beginning Perl or getting the more current 6th edition of Learning Perl. Break your process into its component steps: Import data from 3 files. While you are opening the input files (but you should probably read perlopentut), you never read the files. See I/O Operators in perlop. Average the data. You are not doing this. Perhaps `sum` from List::Util would be useful for you here. Output the result. Again, you open an output file, but never print a result.	[reply] [d/l]
Re: averages from multiple files by aaron_baugher (Curate) on Nov 25, 2011 at 15:53 UTC
Well, the first step in determining if a program is correct is to run it and see what happens. Since you were wise enough to use strict (part of Modern::Perl), yours will generate a syntax error. Fix that, run it again, fix the next one, repeat until the program runs. Then you can start to worry about whether it generates the right results. Just looking at it, the first thing that jumps out (other than s/spilt/split/) is that assigning a block to an array doesn't do anything like what you think it'll do. Saying `my @file1 = { some_code_here }` will (try to) take the results of `some_code_here`, create a hash from them, and assign a reference to that hash to the first element of @file1. Since you want to go through the files line-by-line, look into the while loop for that. Start a while loop like `while(<FILE1>){` and put your line processing stuff inside that. Once you get that working, figure out how to wrap your while loop inside a for loop, so you can put your input filenames in an array and not have to hardcode a separate while loop for each one. Aaron B. My Woefully Neglected Blog, where I occasionally mention Perl.	[reply] [d/l] [select]
Re: averages from multiple files by TJPride (Pilgrim) on Nov 25, 2011 at 16:23 UTC
Doing this is very simple, but you haven't explained the problem very well. How are the values averaged, exactly? Does each file have the same series of indexes, and you want the values counted and averaged and then the indexes / averages put into the output file? If a key doesn't appear in a file, is the count still incremented? `use strict; use warnings; my ($file, $handle, $key, $val, %data); for $file ('file1.txt', 'file2.txt', 'file3.txt', 'file4.txt' , 'file5.txt') { open($handle, $file); while (<$handle>) { chomp; ($key, $val) = split /\s/, $_; $data{$key}{'count'}++; $data{$key}{'sum'} += $val; } } open($handle, '>average.txt'); for $key (sort { $a <=> $b } keys %data) { $data{$key} = $data{$key}{'sum'} / $data{$key}{'count'}; print $handle "$key\t$data{$key}\n"; } close($handle);` [download]	[reply] [d/l]
Re^2: averages from multiple files by kennethk (Abbot) on Nov 25, 2011 at 17:06 UTC
When posting response code, it's important to demonstrate good form to those seeking wisdom. In the above code, you have a few elements which, while functional, are probably bad habits. You use a two-argument open. Three argument makes intent more obvious and keeps your script from misbehaving if your file name contains special characters. This is particularly pertinent in a web-context or when piping. See open. You do not test your opens for success. If you don't do this, you may get weird failures far from the source of the problem since the open fails silently. You declare all of your lexical variables at the head of the file, rather than keeping them as tightly scoped as possible. This can result in data leaking between between parts of your script and spooky action at a distance errors. You have essentially created global variables, and hamstrung some of the great power that strict offers you. When split is invoked with one argument, it is applied to the special variable $_, so you can omit that from your split arguments. If you are implicitly populating `$_`, which you are, you should leverage that power consistently. Additionally, although the OP used `/\s/` in that split, the default split argument of `/\s+/` is probably more appropriate - manually edited tab-delimited files often end up no longer tab-delimited. Therefore, that line should probably just be `my ($key, $val) = split;`. Since you are using Indirect Filehandles, there is no need to explicitly close the file. The file will be automatically closed once the filehandle goes out of scope - one of the great qualities of indirect filehandles in the first place. All of the above points are addressed in the following rewrite of your code: use strict; use warnings; my %data; for my $file ('file1.txt', 'file2.txt', 'file3.txt', 'file4.txt' , 'file5.txt') { open(my $handle, '<', $file) or die "Open fail on $file: $!\n"; while (<$handle>) { chomp; my ($key, $val) = split; $data{$key}{'count'}++; $data{$key}{'sum'} += $val; } } open(my $handle, '>', 'average.txt') or die "Open fail on average.txt: + $!\n"; for my $key (sort { $a <=> $b } keys %data) { $data{$key} = $data{$key}{'sum'} / $data{$key}{'count'}; print $handle "$key\t$data{$key}\n"; } [download] Other changes I might include would be not quoting key arguments for hashes and the use of qw for quoting the file list (see qw/STRING/).	[reply] [d/l] [select]
Re^3: averages from multiple files by Taylorswift13 (Novice) on Nov 25, 2011 at 17:49 UTC
Hi all thank you for thank taking the time to reply to me in depth i understand i am a new user and should do a bit more reading kennethk your code is working well thank you for your description	[reply]
Re^3: averages from multiple files by TJPride (Pilgrim) on Nov 25, 2011 at 19:41 UTC
1. Since the file names in this case are being hard-coded, a three-argument open is unnecessary. 2. A better point, but also unnecessary for this example. 3. If you say so. I personally find it rather messy having "my" spread all through the code at random points. 4. Good point on the argument. Regarding what he's splitting on, that's up to him, not me. I'm not going to bother trying to anticipate all the things that could possibly go wrong with his input data. 5. Yes, but if this is part of a larger program, there may be other things running for some time after this finishes, and the last part of the file buffer may or may not actually write to the file until the close is declared. I prefer to close at the first opportunity just to build good habits. I've actually run into problems not closing things immediately in the past.	[reply]
Re^4: averages from multiple files by Marshall (Canon) on Nov 27, 2011 at 03:58 UTC