in reply to reading files in @ARGV doesn't return expected output

Hi fasoli,

a few comments on your script (some of which have already been made by other monks).

In your two nested for loops, you're opening files using $input1 and $input2 file handles, but you're never using these file handles. So this is useless. Even if it were not useless (i.e. if you had some code to read from them), it would still not work properly, because opening a file handle with the same name would close the one you've opened before.

With your nested for loops, you're using the second set of data 3 times (one for each iteration over the first set of data). That's certainly not what you want. And that's why you don't get your expected result when printing @ARGV. You most probably need separate loops.

You don't need to escape the slashes in your path.

Even though it is possible to do so, you should probably not use the while ($line = <>) { syntax because it confuses matters. In particular, $. will not be properly reset, so that your $nextUnless conditional will work properly only for the first file.

The @columns array will be clobbered with new values each time through the while loop. You may not care if you only want to print it out in the next line, but since you're using @column in the final for loop, I stringly suspect this is wrong. I am not sure, however what you need there, since you haven't shown what your data looks like.

The Data::Dumper module may help you check the content of your data structures.

This is an attempt at correcting the first part of your program (the part managing files), I can't help you with the second part without having seen the data (and the expected result).

use strict; use warnings; use Data::Dumper; my $molec1 = "molec1"; my $molec2 = "molec2"; my $input1; my $input2; my $path = "/store/group/comparisons"; my @files; my $line; my @columns; my $nextUnless = 2; # nr of lines to skip my $CountLines = 0; # total nr of lines in all files for my $i (1..3) { push @files, "$path/File-${molec1}-cluster${i}.out"; } for my $j (1..2) { push @iles, "$path/File-${molec2}-cluster${j}.out"; } print "@files \n"; # for testing; are correct files printed? ## now split and print my @list; for my $file (@files) { open my $FH, "<", $file or die "Cannot open $file $!" while ($line = <$FH>) { $CountLines += 1; next unless $. > $nextUnless; chomp $line; $line =~ s/^\s+|\s+$//g; push @list, [split/\s+/, $line]; @columns = split /\s+/, $line; # this is most probably wrong, m +ay be you need to push a reference, as you did in the previous line } close $FH; } print Dumper \@list; # check that the content of @list is what you exp +ect # ...
In brief, I haven't tested that (not possible without any data), and this is not the end of it (you need to fix the thing about @columns in the while loop and probably to change the last for loop, but this should get you much closer to what you need.

Please, PLEASE, show a sample of your data (and expected result).

Update: I had forgotten to change @ARGV to @files in the second ($j) for loop. Fixed now.

Replies are listed 'Best First'.
Re^2: reading files in @ARGV doesn't return expected output
by fasoli (Beadle) on Jun 27, 2017 at 10:17 UTC

    Thank you so much for the effort, I haven't had a chance to look at your reply properly yet as I'm burning up with fever so probably not the best time to try and concentrate on scripting.

    My data files are plain text files files that contain text and numbers stored in matrices, they look like this:

    #title line - (skipped it with $nextUnless) #title line - (skipped it with $nextUnless) 1 2 3 4 5 6 7 8 9

    they're not necessarily 5 lines long, this is an example. The actual matrices are much bigger, I think my biggest file is a 85x85 matrix.

    What I want to do is perform some mathematical calculations on the combination of matrices $i and $j. Get the averages of those matrices, their deviation etc. I haven't included this bit of the code yet but (fingers crossed) it works.

    So I want to calculate the deviation between all matrices from file $i and files $j. The end result would be i) another matrix but this time with the average values or ii) a single value, for the deviation. But like I said this bit of the code isn't shown here as I tried to keep it to a minimum - if I fail in opening and splitting them in columns then I won't be able to move on to the next bit anyway.

    At this stage I was trying to print them for testing only.

    Hopefully this clears things up a bit and I honestly hope you guys will continue giving me feedback as I'm beyond stuck. Now off to read all answers from the top and hopefully understand what I've been messing up.

    This website is an immensely helpful resource, thank you everyone. I sure wish one day I'll know enough to help others with their scripting problems :D

      If all you wanted to do in your last nested for loop is to print out the data, then you don't need it, since this line:
      print Dumper \@list; # check that the content of @list is what you exp +ect
      which I have put at the very end of my suggested code will output your @list data structure in a nice and clear format.

      So you could try to run my code and see if you get what you want.

        Sorry I should have said that in the last nested for loop I am *for now* only printing for testing. This is where all the maths will be which is all written in a  $list[$a][$b] basis, I'm just printing here for testing. The rest of the code works, It's just that i now need to make it work for a combination of multiple files, rather than on couples of files like I did so far. That's why Dumper \@list isn't suitable.

        Basically what I want to know is: is there a way to open those files with two for loops like so

        for ($i; $i<=4; $i++) { for ($j; $j<=4; $j++) { #etc while <lines_of_both_$i_and_$j_files> { #etc

        This was why I tried to use while <>: extensive googling told me that using @ARGV and the diamond operator was the most efficient way to open multiple files and read them line by line with while. I have a perfectly working script that does all the maths I need but unfortunately it's only doing it for while <$line>. How do I tell perl that I want this done for while <lines_of_both_$i_and_$j_files>? I guess that's what my question boils down to.

        I'm sorry I'm really struggling with this, getting really stressed and frustrated that I'm constantly buggering it up and can't even explain properly. I am very grateful for all contributions here and I'm reading and studying all of them, however not understanding everything. Quite disheartening as I've been "coding" on and off for a couple years now so expected to have learnt more

      As an aside, fasoli, you said,

      ... perform some mathematical calculations on the combination of matrices ... I haven't included this bit of the code yet but (fingers crossed) it works.

      When Marshall posted his example matrix_transpose, I was reminded that I wanted to point out: instead of crossing your fingers that your (or Marshall's) roll-your-own-code truly works, there are plenty of modules and families of modules that will do the matrix math and have been fully tested across edge cases. Math::MatrixReal, Math::GSL, and PDL::Matrix are three such well-tested Matrix modules. It is probably worth your time to try out one or more of those -- their math has been checked thoroughly over the years, and they are likely to run faster (Benchmark), too.

        Probably the best-performing one would be the LAPACK-wrapper, PDL::LinearAlgebra - it has nice wrappers for LAPACK, or if you prefer, you can access the raw LAPACK functions.
Re^2: reading files in @ARGV doesn't return expected output
by fasoli (Beadle) on Jun 27, 2017 at 12:01 UTC
    Sorry just reading this more carefully now after I ran it. Am I correct in understanding that it only loops through my $i files? The $j files are pushed in @ARGV and are just left there. My problem all along was how to read both $i and $j files and split them using the while $line loop, so that I can move on to the maths later on. Your code seems to only deal with the $i files and loop through them, so how will I be able to build on this to get the $i-$j deviations later on if the $j files are completely ignored? I'm sorry, it's probably there and I'm just not getting it aren't I?
      Sorry, I forgot to change it in the second for loop, which should be:
      for my $j (1..2) { push @files, "$path/File-${molec2}-cluster${j}.out"; }