in reply to how to reset the input operater in while loop?

Probably you are trying to do the wrong thing. You should almost never need to reread a file in a nested loop like that. Most likely what you need to do is read file2 into a hash then use hash lookups to check for matches in file1. Something like:

use strict; use warnings; my $data1 = <<DATA; 1 a 2 b 3 c DATA my $data2 = <<DATA; 4 d 1 x DATA my %data2Hash; open my $infile2, '<', \$data2; while (<$infile2>) { chomp; my ($key, $tail) = split ' ', $_, 2; $data2Hash{$key} = $tail; } close $infile2; open my $infile1, '<', \$data1; while (<$infile1>) { chomp; my ($key, $tail) = split ' ', $_, 2; if (exists $data2Hash{$key}) { print "Matched $key: $data2Hash{$key}, $tail\n"; } }

Prints:

Matched 1: x, a

Perl reduces RSI - it saves typing

Replies are listed 'Best First'.
Re^2: how to reset the input operater in while loop?
by lightoverhead (Pilgrim) on Sep 30, 2008 at 07:06 UTC
    Thank you for your answer. I know this is not the right way to do it. Sorry for this confusion. In fact what I was trying to do is to compare each line of file1 with each line of file2, not just match each other. I had two ways to do it,first,open and close file2 for each line for file1, thus I can iterate all the lines of file2 every time. Second, I can build an array to store items of file1 or file2,then iterate them. Using hash should be fine too. But either way (opening/closing file or array/hash) has its own shortcoming. opening/closing file will be slower (right?),array/hash will consume memory. These two files are very very huge files. I recalled that I might read it somewhere that we could reset the position of <> operator, and every time when one round iteration is done, it might be set back to do another round iteration. I am not sure if this method exists. But if there is one like this, I could be able to re-iterate the file without opening/closing files or building arrays/hashes. Thanks.

      The time to open and close the files is completely insignificant compared to the time to read them, especially if the files are large. You really don't want to do it that way!

      Tell us about the bigger picture. It is almost certain that there is a better way of achieving what you want to do than reparsing one of the files once for each line of another file. It may be that you need to sort the files first, or use a database, or extract key information, but whatever the technique, it will be much faster than the scheme you are currently considering.


      Perl reduces RSI - it saves typing
        The time to open and close the files is completely insignificant compared to the time to read them, especially if the files are large.

        On the other hand, if the files are small, and you happen to open and close them lots of times, then you find out that the process of opening and closing is also very costly, compared to having memory-resident data that you just use over and over within loops. I would not consider the time taken to open and close files to be "completely insignificant".

      what I was trying to do is to compare each line of file1 with each line of file2, not just match each other.

      That is not clear. How is "compare" different from "match"? What do each of those terms really mean, for your purposes? Show a couple examples of data from each file, and what sort of output you want with regard to those examples.

      Might there be duplicate lines within a given file? Do you need to keep track of the particular positions in one or both files when there is a "match" (or some particular result of "comparison"), or will it be enough just to list the data that matches/compares? Do you need to preserve or enforce a particular ordering in your output?

      If the files are "very huge", then it will be very important to be very clear about what you are really trying to accomplish; having the wrong task in mind, and/or using the wrong approach, can waste a "very huge" amount of time.