FireBird34 has asked for the wisdom of the Perl Monks concerning the following question:

Is there a more effective way to read from a file, but not store it in the memory? I'm basically reading from a huge file, and doing some work on it. As it stands, it is taking me approx 10 seconds just to read from the file (that's about 10 seconds to long). Anyway, to help ease what is needed, this is roughly what I need to do:

read from the file
manipulate the line
compare the line with external info
get the second line if info is not correct, or exit loop if it is correct.

This is some basic sample code:
open(FH, "/path/to/file") || die "Can't open file: $!"; my @file = <FH>; close FH; foreach my $file (@file){ if($file eq $external_info){ print "great\!"; last; } }
Thanks for any help

Replies are listed 'Best First'.
Re: Reading from file, not to memory
by steves (Curate) on Feb 20, 2003 at 03:10 UTC

    Ouch. You don't have to read all lines in at once first. This line is doing that since the angle bracket operator is invoked in array context:

    my @file = <FH>;
    To read a line at a time do this instead:
    local *FH; open(FH, "/path/to/file") || die "Can't open file: $!"; while (<FH>) { # line is in $_ if ($_ eq $external_info) { print "great\!"; last; } } close(FH);

      I got what I needed -- thanks for the help!
Re: Reading from file, not to memory
by Zaxo (Archbishop) on Feb 20, 2003 at 03:21 UTC

    You may be getting into swap by reading the whole file into an array. A while loop on the filehandle will read one line at a time into $_, keeping memory requirements to a minimum.

    open my $fh, '<', '/path/to/file' or die $!; while (<$fh>) { if ( $_ eq $external_info ) { print 'great!', $/; last; } } close $fh or die $!;
    You will need to chomp at the top of the while loop if $external_info does not have a newline at the end.

    I converted to 3-arg open, and used a lexical file handle for that. You can convert back to suit previous versions of perl.

    After Compline,
    Zaxo

Re: Reading from file, not to memory
by rob_au (Abbot) on Feb 20, 2003 at 10:30 UTC
    Another option which I'm surprised hasn't been mentioned yet is the use of the Tie::File module - This module, part of the standard 5.8.0 distribution, allows lines of a file to be manipulated via a tied-array without requiring the entire file to be read into memory. Additionally, the amount of memory used for read caching and write buffering can be controlled by the memory argument to the tie constructor.

    Using this code, you code might look like:

    # Tie the array @file to /path/to/file using a maximum of 1Mb of mem +ory tie my @file, 'Tie::File', '/path/to/file', 'memory' => 1_000_000 or die "Cannot open file - $!"; foreach my $line ( @file ) { # ... Your code follows ... }

    This module is quite stable, despite the beta label given to it by its author, and works exceptionally well in a production environment (even under 5.005.03).

     

    perl -le 'print+unpack("N",pack("B32","00000000000000000000001000110010"))'

Re: Reading from file, not to memory
by Anonymous Monk on Feb 20, 2003 at 08:23 UTC
    You don't need the array, you can just use a while statement on the filehandle. e.g.
    while <FH> { if ($_ eq $external_info) { print "great\!"; last; } }
Re: Reading from file, not to memory
by physi (Friar) on Feb 20, 2003 at 09:09 UTC
    You can set the INPUT_RECORD_SEPARATOR $/ to your $external_info.
    $/="$external_info"; open(FH, "/path/to/file") || die "Can't open file: $!"; while (<FH>) { if length($_) != (stat FH)[7]) { print "great\n"; last; } }
    The bad Thing about this way is, that again the whole File goes into memory if the $external_info is not in the File...
    So the other solutions might be a bit better ;-)
    -----------------------------------
    --the good, the bad and the physi--
    -----------------------------------
    
Re: Reading from file, not to memory
by pfaut (Priest) on Feb 20, 2003 at 03:14 UTC

    When you reference a file handle in list context (by assigning to an array), you read the whole file at once into memory. You can intead read the file line by line by using the file handel in scalar context (by assigning to a scalar). Instead of this...

    open(FH, "/path/to/file") || die "Can't open file: $!"; my @file = <FH>; close FH;

    ...try this...

    open(FH, "/path/to/file") || die "Can't open file: $!"; foreach my $file (<FH>){ if($file eq $external_info){ print "great\!"; last; } } close FH;
    --- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';

      You have to be careful here, because you are doing exactly the same thing. The foreach statement is also accessing <FB> in array context and is reading the whole file into an anonymous array and then iterating over it. You have to use a while loop for this to work correctly.

      Try out the following bit of code to test this:

      use Benchmark; timethese(1, { 'Trial1 While' => sub { open (FILE, "file2") or die "Can't open file: $!\n"; while (<FILE>) { last; # read one line and exit } close FILE; }, 'Trial2 Foreach' => sub { open (FILE, "file1") or die "Can't open file: $!\n"; foreach my $line (<FILE>) { last; # read one line and exit } close FILE; }, });

      Make sure that file1 and file2 are identical (I used two files so that we know there is no caching going on), and that they are large text files. I got the following results with 2 50Meg files:

      Benchmark: timing 1 iterations of Trial1 While, Trial2 Foreach... Trial1 While: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) Trial2 Foreach: 17 wallclock secs (10.77 usr + 1.56 sys = 12.33 CPU) +@ 0.08/s (n=1)
      Your code still loads all the file first! The code (<FH>) generate first all the array that will be readed by foreach!

      Take a look in this 2 test scripts. They show the file buffer position when you print each line:

      my $file = $0 ; open(FH, $file) ; foreach my $line (<FH>){ my $tel = tell(FH) ; print "$tel>> $line" ; } close FH;
      OK way:
      my $file = $0 ; open(FH, $file) ; while ( my $line = <FH> ) { my $tel = tell(FH) ; print "$tel>> $line" ; } close FH;
      You can see that in the 1st way the buffer was always in the end, because it already have loaded all the file!

      Graciliano M. P.
      "The creativity is the expression of the liberty".

      Let this note stand as a testimony to the dangers of cut-n-paste or cargo culting. In my code, I always use while when reading from files (honest!) but in my laziness, I copied and pasted from the base node into my reply without looking the code over well enough.

      --- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';