in reply to Using in multiple passes

merlyn's works if you can fit all of the data into (virtual) memory (and is the fastest unless you swap too much). fundflow's works if reading a single file that is seekable. Albannach's works if reading from more than one seekable file specified on the command line (and you are sure files won't be renamed, for example, until after the program finishes). None of them works for all cases. Each of them is an acceptable solution for a large set of problems. So you'll need to decide what types of problems you plan to solve.

If you are pretty sure that you won't have to deal with really large files, then cache the lines in an array. When doing operations that require two passes, it is very common to only deal with one file at a time and require that the file be seekable. So using seek() (and dieing if that fails) is often a very good choice.

In the very rare case where you need to do two passes over multiple files, some of which might be very large and some of which may not be seekable, I'd do something like this:

use IO::File; my $cache= IO::File->new_tmpfile() or die "Can't create temporary file: $!\n"; print $cache $_ or die "Can't append to temporary file: $!\n" while defined( $_= <> ); seek( $cache, 0, 0 ) or die "Can't rewind temporary file: $!\n"; while( <$cache> ) { ... } seek( $cache, 0, 0 ) or die "Can't rewind temporary file: $!\n"; while( <$cache> ) { ... }

Of course, this doesn't work if you don't have enough temporary file space. But that puts the problem where it belongs: in the hands of the person trying to deal with such huge files who should arrange to have enough temporary space.

        - tye (but my friends call me "Tye")