Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Masters, I want to read all lines in all input files more than once. What I don't know is what the statement reset should be in cases like the following. Does anyone know?
while(<>) { pass1; } reset; while(<>) { pass2; }
Thanks in advance!
Isak <isak@hypergene.com>

Replies are listed 'Best First'.
Re: Using in multiple passes
by fundflow (Chaplain) on Oct 25, 2000 at 18:03 UTC
    merlyn's answer is only good for small file (i.e. ones that fit in the memory you have available for this proccess).

    If the file can be big, then you should favor multiple passes as you requested originally. To do this, use:

    seek(FILEHANDLE, SEEK_SET, 0)
    between the passes to go back to the beginning.
        Yes, of course.

        Just for curiousity, i tried the following:

        #!/usr/bin/perl -w while(<>) { print }; seek(STDIN, 0, 0) or die; print "-----PASS 2-----\n"; while(<>) {print}
        and ran it with:     > ./t.pl < t.pl and it did read the file twice! How come?
        (I was expecting an error like 'non-seekable file' or so.)

        When doing

        >cat t.pl | perl t.pl
        it died where it should have (i.e. in the seek()).
Re: Using in multiple passes
by merlyn (Sage) on Oct 25, 2000 at 17:24 UTC
(tye)Re: Using in multiple passes
by tye (Sage) on Oct 25, 2000 at 20:39 UTC

    merlyn's works if you can fit all of the data into (virtual) memory (and is the fastest unless you swap too much). fundflow's works if reading a single file that is seekable. Albannach's works if reading from more than one seekable file specified on the command line (and you are sure files won't be renamed, for example, until after the program finishes). None of them works for all cases. Each of them is an acceptable solution for a large set of problems. So you'll need to decide what types of problems you plan to solve.

    If you are pretty sure that you won't have to deal with really large files, then cache the lines in an array. When doing operations that require two passes, it is very common to only deal with one file at a time and require that the file be seekable. So using seek() (and dieing if that fails) is often a very good choice.

    In the very rare case where you need to do two passes over multiple files, some of which might be very large and some of which may not be seekable, I'd do something like this:

    use IO::File; my $cache= IO::File->new_tmpfile() or die "Can't create temporary file: $!\n"; print $cache $_ or die "Can't append to temporary file: $!\n" while defined( $_= <> ); seek( $cache, 0, 0 ) or die "Can't rewind temporary file: $!\n"; while( <$cache> ) { ... } seek( $cache, 0, 0 ) or die "Can't rewind temporary file: $!\n"; while( <$cache> ) { ... }

    Of course, this doesn't work if you don't have enough temporary file space. But that puts the problem where it belongs: in the hands of the person trying to deal with such huge files who should arrange to have enough temporary space.

            - tye (but my friends call me "Tye")
Enlightened.
by Anonymous Monk on Oct 26, 2000 at 11:55 UTC
    Thank you very much for your answers. I've learnt a lot. Right now I'll try Albannach's method, it satisfies the requirements I have for the moment. See you!

    Be well!
    Isak

RE: Using in multiple passes
by Albannach (Monsignor) on Oct 25, 2000 at 18:43 UTC
    How about something like:
    #!/usr/local/bin/perl -w use strict; my @args = @ARGV; print "-----PASS 1-----\n"; while(<>){ print; } @ARGV = @args; print "-----PASS 2-----\n"; while(<>){ print; }
    The second use of <> should just repeat the list of input files.

    Update
    tye (below) just reminded me of another weakness with this - the files could have changed between passes! Not good... unless you want to check for changes in the passes?

RE: Using in multiple passes
by princepawn (Parson) on Oct 25, 2000 at 18:28 UTC
    Or, more succintly:
    @everything = <> map { pass2 } map { pass 1 } @everything;

    note well the order of pass calls.

      But as I told you in the chatterbox, that's less efficient, and not even particularly elegant. And it seemed that the original poster wanted the same data to both passes, so you'd have to be very careful to pass $_ through each of your passes correctly.

      Oh well.

      -- Randal L. Schwartz, Perl hacker