remluvr has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone.
Is it possible to read two file in parallel and grab values from them?
I mean, let's say I have two input file. The first is like this:

alligator-n attri aggressive-j 0.0521953848067504 alligator-n attri aquatic-j 0.0428300684553727 alligator-n attri big-j 0.125268137962678 alligator-n attri carnivorous-j 0.0218253043999405 lizard-n hyper creature-n 0.618765795370872 lizard-n hyper reptile-n 0.466342631493931 lizard-n hyper vertebrate-n 0.249202543226053 lizard-n mero blood-n 0.397326263768692 dishwasher-n random-v prevent-v 0.00856940857204332 dishwasher-n random-v qualify-v 0.00675098458337709 dishwasher-n random-v top-v 0.0114499058531553 dishwasher-n random-v visit-v 0.0134718844088632 freezer-n attri big-j 0.0865285153310031 freezer-n attri clean-j 0.078801694198009
and the second like this:
alligator-n attri aggressive-j 0.345621780 alligator-n attri aquatic-j 0.46634263149393 alligator-n attri big-j 0.125268137962678 alligator-n attri carnivorous-j 0.0218253043999405 lizard-n hyper creature-n 0.618765795370872 lizard-n hyper reptile-n 0.466342631493931 lizard-n hyper vertebrate-n 0.0428300684553727 lizard-n mero blood-n dishwasher-n random-v prevent-v 0.00856940857204332 dishwasher-n random-v qualify-v 0.00675098458337709 dishwasher-n random-v top-v 0.125268137962678 dishwasher-n random-v visit-v 0.0134718844088632 freezer-n attri big-j0.078801694198009 freezer-n attri clean-j 0.397326263768692

That is they are exactly the same, except for the last value.
How can I read them in order to confront the last value of each line of the first file with the value of the exactly same line in the second file?For each line I have to apply the following formula: value_from_the first*(1-value_from_the_second).
My output should look like the input, except for the last value, which should be derived as the previous formula suggested.
Thanks a lot.
Giulia

Replies are listed 'Best First'.
Re: Read and analyze two file in parallel
by roboticus (Chancellor) on Mar 14, 2012 at 21:51 UTC

    remluvr:

    Assuming the data is in the same order, it should be as easy as something like this (untested):

    use strict; use warnings; use autodie; open my $IF1, '<', 'File.in1'; open my $IF2, '<', 'File.in2'; open my $OF, '>', 'File.out'; while (1) { my @Rec1 = split /\s+/, <$IF1>; my @Rec2 = split /\s+/, <$IF2>; $Rec1[-1] += $Rec2[-1]; print $OF join("\t", @Rec1), "\n"; last if eof($IF1) and eof($IF2); }

    Of course, you'll have to add checking to verify that the records are compatible. If you sort the files beforehand, then in the event of a mismatch, you should be able to simply re-read the file containing the "smaller" (in value) string.

    Update: As CountZero mentions, stopping the loop might be a good idea. So I added the last statement.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Interesting, but how do you break from your while(1) loop?

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      My blog: Imperial Deltronics

      Thanks! In the meanwhile I tried something and it worked, I post it here in case it could be helpful to someone (or in case you have suggestions to give me on it). Here it is:

      sub read_file_line { my $fh = shift; if ($fh and my $line = <$fh>) { chomp $line; return [ split(/\t/, $line) ]; } return; } #sub compute { # do something with the 2 values #} open(my $f1, $input); open(my $f2, $input_1); my $pair1 = read_file_line($f1); my $pair2 = read_file_line($f2); my $value; open OUT,">$file"; while ($pair1 and $pair2) { $value=$pair1->[3]*(1-$pair2->[3]); $pair1 = read_file_line($f1); $pair2 = read_file_line($f2); print OUT $pair1->[0]."\t".$pair1->[1]."\t".$pair1->[2]."\t".$valu +e."\n"; } close($f1); close($f2); close OUT;

      Thanks,
      Giulia

        Some suggestions:
        That's a bit awkward, because you call read_file_line() inside and outside your while loop, so maintenance-wise it would be easy to change one and forget to change the other. You could combine both calls into your while loop condition:

        while(my $pair1 = read_file_line($f1) and my $pair2 = read_file_line($ +f2)){

        Also, since you only use $value inside the loop, there's no need to declare it outside. Just declare it with my when you assign to it inside the loop. Declaring it outside the loop means it'll retain its value between loops, which shouldn't hurt in this case, but can make odd things happen if the assignment to it is ever conditional.

        In your print statement, there's no need to concatenate all those things. Just put your variables and \t characters inside one set of quotes.

        Aaron B.
        My Woefully Neglected Blog, where I occasionally mention Perl.

      Update: As CountZero mentions, stopping the loop might be a good idea. So I added the last statement.

      while(not( grep eof, $in1, $in2 )){
      ...
      }
        Correction grep \&eof, otherwise you're testing @ARGV