lightoverhead has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I am just puzzled about how to reset the <> operator in while loop. For example, I have two levels loops like this:
while (<FILE1>){ chomp; my @array1=split; my $a1=$array1[1]; my $a2=$array1[2]; ...... while (<FILE2>){ #wrong, cannot iterate all the lines my @array2=split; my $b1=$array2[0]; my $b2=$array2[1]; if ($a1 eq $b1){ .......} else {......} } }
The problem for the above code is that it cannot iterate all the lines of file2. When the first level of while start its second line, the <> operater of the second level while was already at the end of the file2. Of course I can use "for" loops instead of second while loop. But is there a method that I can reset the position of <> operator and make it to re-iterate the file2 for every line of file1(the first while loop)? Thank you.

Replies are listed 'Best First'.
Re: how to reset the input operater in while loop?
by GrandFather (Saint) on Sep 30, 2008 at 05:49 UTC

    Probably you are trying to do the wrong thing. You should almost never need to reread a file in a nested loop like that. Most likely what you need to do is read file2 into a hash then use hash lookups to check for matches in file1. Something like:

    use strict; use warnings; my $data1 = <<DATA; 1 a 2 b 3 c DATA my $data2 = <<DATA; 4 d 1 x DATA my %data2Hash; open my $infile2, '<', \$data2; while (<$infile2>) { chomp; my ($key, $tail) = split ' ', $_, 2; $data2Hash{$key} = $tail; } close $infile2; open my $infile1, '<', \$data1; while (<$infile1>) { chomp; my ($key, $tail) = split ' ', $_, 2; if (exists $data2Hash{$key}) { print "Matched $key: $data2Hash{$key}, $tail\n"; } }

    Prints:

    Matched 1: x, a

    Perl reduces RSI - it saves typing
      Thank you for your answer. I know this is not the right way to do it. Sorry for this confusion. In fact what I was trying to do is to compare each line of file1 with each line of file2, not just match each other. I had two ways to do it,first,open and close file2 for each line for file1, thus I can iterate all the lines of file2 every time. Second, I can build an array to store items of file1 or file2,then iterate them. Using hash should be fine too. But either way (opening/closing file or array/hash) has its own shortcoming. opening/closing file will be slower (right?),array/hash will consume memory. These two files are very very huge files. I recalled that I might read it somewhere that we could reset the position of <> operator, and every time when one round iteration is done, it might be set back to do another round iteration. I am not sure if this method exists. But if there is one like this, I could be able to re-iterate the file without opening/closing files or building arrays/hashes. Thanks.

        The time to open and close the files is completely insignificant compared to the time to read them, especially if the files are large. You really don't want to do it that way!

        Tell us about the bigger picture. It is almost certain that there is a better way of achieving what you want to do than reparsing one of the files once for each line of another file. It may be that you need to sort the files first, or use a database, or extract key information, but whatever the technique, it will be much faster than the scheme you are currently considering.


        Perl reduces RSI - it saves typing
        what I was trying to do is to compare each line of file1 with each line of file2, not just match each other.

        That is not clear. How is "compare" different from "match"? What do each of those terms really mean, for your purposes? Show a couple examples of data from each file, and what sort of output you want with regard to those examples.

        Might there be duplicate lines within a given file? Do you need to keep track of the particular positions in one or both files when there is a "match" (or some particular result of "comparison"), or will it be enough just to list the data that matches/compares? Do you need to preserve or enforce a particular ordering in your output?

        If the files are "very huge", then it will be very important to be very clear about what you are really trying to accomplish; having the wrong task in mind, and/or using the wrong approach, can waste a "very huge" amount of time.

Re: how to reset the input operater in while loop?
by JavaFan (Canon) on Sep 30, 2008 at 07:09 UTC
    You mean, you want to seek back to the beginning of the file? Assuming the file is seekable (if it's not, you cannot do what you want), you'd use the function seek to seek back to the beginning. perldoc -f seek will give you the details.
      Assuming the file is seekable (if it's not, you cannot do what you want) …
      Why wouldn't GrandFather's approach of close-ing and re-opening work even if the filehandle is non-seekable?

      UPDATE: Thanks to JavaFan for a very polite answer to a very silly question.

        Because non-seekable usually means the data can only be read only once. Examples of non-seekable handles are (named) pipes, STDIN (usually) and network sockets.