in reply to Re: reading files in @ARGV doesn't return expected output
in thread reading files in @ARGV doesn't return expected output

Thank you so much for the effort, I haven't had a chance to look at your reply properly yet as I'm burning up with fever so probably not the best time to try and concentrate on scripting.

My data files are plain text files files that contain text and numbers stored in matrices, they look like this:

#title line - (skipped it with $nextUnless) #title line - (skipped it with $nextUnless) 1 2 3 4 5 6 7 8 9

they're not necessarily 5 lines long, this is an example. The actual matrices are much bigger, I think my biggest file is a 85x85 matrix.

What I want to do is perform some mathematical calculations on the combination of matrices $i and $j. Get the averages of those matrices, their deviation etc. I haven't included this bit of the code yet but (fingers crossed) it works.

So I want to calculate the deviation between all matrices from file $i and files $j. The end result would be i) another matrix but this time with the average values or ii) a single value, for the deviation. But like I said this bit of the code isn't shown here as I tried to keep it to a minimum - if I fail in opening and splitting them in columns then I won't be able to move on to the next bit anyway.

At this stage I was trying to print them for testing only.

Hopefully this clears things up a bit and I honestly hope you guys will continue giving me feedback as I'm beyond stuck. Now off to read all answers from the top and hopefully understand what I've been messing up.

This website is an immensely helpful resource, thank you everyone. I sure wish one day I'll know enough to help others with their scripting problems :D

Replies are listed 'Best First'.
Re^3: reading files in @ARGV doesn't return expected output
by Laurent_R (Canon) on Jun 27, 2017 at 10:55 UTC
    If all you wanted to do in your last nested for loop is to print out the data, then you don't need it, since this line:
    print Dumper \@list; # check that the content of @list is what you exp +ect
    which I have put at the very end of my suggested code will output your @list data structure in a nice and clear format.

    So you could try to run my code and see if you get what you want.

      Sorry I should have said that in the last nested for loop I am *for now* only printing for testing. This is where all the maths will be which is all written in a  $list[$a][$b] basis, I'm just printing here for testing. The rest of the code works, It's just that i now need to make it work for a combination of multiple files, rather than on couples of files like I did so far. That's why Dumper \@list isn't suitable.

      Basically what I want to know is: is there a way to open those files with two for loops like so

      for ($i; $i<=4; $i++) { for ($j; $j<=4; $j++) { #etc while <lines_of_both_$i_and_$j_files> { #etc

      This was why I tried to use while <>: extensive googling told me that using @ARGV and the diamond operator was the most efficient way to open multiple files and read them line by line with while. I have a perfectly working script that does all the maths I need but unfortunately it's only doing it for while <$line>. How do I tell perl that I want this done for while <lines_of_both_$i_and_$j_files>? I guess that's what my question boils down to.

      I'm sorry I'm really struggling with this, getting really stressed and frustrated that I'm constantly buggering it up and can't even explain properly. I am very grateful for all contributions here and I'm reading and studying all of them, however not understanding everything. Quite disheartening as I've been "coding" on and off for a couple years now so expected to have learnt more

        With multiple files then maybe what you want is a 3 dimensional array ?

        #!/bin/perl/ use strict; use warnings; my $molec1 = "molec1"; my $molec2 = "molec2"; my $path = "/store/group/comparisons"; my @matrix; my @files; for my $i (1..3){ push @files, $path."/File-$molec1-cluster$i.out" } for my $j (1..2){ push @files, $path."/File-$molec2-cluster$j.out" } my @all = (); for my $filename (@files){ open my $fh,'<',$filename or die "Could not open $filename"; my @matrix = (); while (my $line = <$fh>){ next if ($line =~ /^#/); chomp $line; push @matrix,[split /\s+/,$line]; } close $fh; push @all,\@matrix; } my $matrices = scalar @all; for my $i (0..$matrices-1){ print $files[$i]."\n"; my $rows = scalar @{$all[$i]}; for my $r (0..$rows-1){ my $cols = scalar @{$all[$i][$r]}; for my $c (0..$cols-1){ print "$all[$i][$r][$c] " } print "\n"; } print "\n"; }
        poj
        Basically what I want to know is: is there a way to open those files with two for loops like so
        for ($i; $i<=4; $i++) { for ($j; $j<=4; $j++) { #etc
        Yes, you can do that, but that would be very inefficient and that's most probably not what you should do, because it would mean opening the second series of files a number of times, and there is nothing in what you've described that would make this necessary.

        With the code that I have provided in my first post (including the small corrections I made on the @ARGV array that I had forgot to remove in the second for loop), you should be able to read all the files.

        If, on the other hand, you want to combine in some ways files from your first set with files of your second set, then it is more complicated, but you still don't want to read the same files many times over. But the bottom line is that there is nothing in what you said so far that indicates something in this direction.

Re^3: reading files in @ARGV doesn't return expected output
by pryrt (Abbot) on Jun 27, 2017 at 21:42 UTC

    As an aside, fasoli, you said,

    ... perform some mathematical calculations on the combination of matrices ... I haven't included this bit of the code yet but (fingers crossed) it works.

    When Marshall posted his example matrix_transpose, I was reminded that I wanted to point out: instead of crossing your fingers that your (or Marshall's) roll-your-own-code truly works, there are plenty of modules and families of modules that will do the matrix math and have been fully tested across edge cases. Math::MatrixReal, Math::GSL, and PDL::Matrix are three such well-tested Matrix modules. It is probably worth your time to try out one or more of those -- their math has been checked thoroughly over the years, and they are likely to run faster (Benchmark), too.

      Probably the best-performing one would be the LAPACK-wrapper, PDL::LinearAlgebra - it has nice wrappers for LAPACK, or if you prefer, you can access the raw LAPACK functions.