Quickest way of reading in large files?

Mac has asked for the wisdom of the Perl Monks concerning the following question:

Hi all.

I have a question, which is quicker when reading large files line by line... this one:

#!/usr/local/bin/perl

open (IN, "/text.txt");
while (<IN>){
    if ($_ =~ /hello/){
        # do what ever
    }
}
close IN;
[download]

---OR---

#!/usr/local/bin/perl

open (IN, "/text.txt");
@lines = (IN);
close (IN);
foreach (@lines){
    if ($_ =~ /hello world/){
        # do what ever
    }
}
[download]

Just wondering if the file size is > 100meg and was entirely text

Edit Masem 2001-08-20 - Edit title from "which is quicker?"

Comment on Quickest way of reading in large files? Select or Download Code

Replies are listed 'Best First'.
Re: Quickest way of reading in large files (while v. for)? by CheeseLord (Deacon) on Aug 20, 2001 at 08:33 UTC
I'd have to say the `while` version's quicker, simply because it's not going to hog tons of memory storing the the entire file in an array as the `foreach` version will. In fact, just the other day, somebody posted some code that addressed this problem - after changing to a while loop, execution time was cut by over 40%. From the "I forgot to mention this" dept.: If your file is over 100 meg, not only will `foreach` be a lot slower, it may not even finish running, due to the memory issue I describe above. I highly recommend using a `while` loop for a file that large. [Update: Changed title to match root node] His Royal Cheeziness	[reply]
Re: which is quicker? by maverick (Curate) on Aug 20, 2001 at 08:33 UTC
Check out the Benchmark module (perldoc Benchmark). Here's what it says: `#!/usr/bin/perl use strict; use Benchmark; sub line_by_line { open (IN, "/usr/share/dict/words"); while (<IN>){ if ($_ =~ /hello world/){ # do what ever } } close IN; } sub block { open (IN, "/usr/share/dict/words"); my @lines = <IN>; close (IN); foreach (@lines){ if ($_ =~ /hello world/){ # do what ever } } } timethese(100,{ 'line' => \&line_by_line, 'block' => \&block });` [download] Which Yields `Benchmark: timing 100 iterations of block, line... block: 32 wallclock secs (32.52 usr + 0.36 sys = 32.88 CPU) @ 3 +.04/s (n=100) line: 19 wallclock secs (17.92 usr + 0.16 sys = 18.08 CPU) @ 5 +.53/s (n=100)` [download] /\/\averick perl -l -e "eval pack('h','072796e6470272f2c5f2c5166756279636b672');" Updated:*Fixed regexp to be the same in both subs. Used question's code verbatim. Thanks lemming.	[reply] [d/l] [select]
Re: Quickest way of reading in large files? by stefp (Vicar) on Aug 20, 2001 at 23:26 UTC
Besides the cost of holding everything at once in memory, your code incurs the cost of unedeed strings copy. There is no such things as "copy on write strings" where the strings value would be shared until modified. BTW, It is not clear that the overhead of implementing "copy on write strings" in Perl will be less than the ones imposed by the undeeded allocation and copy of strings. -- stefp	[reply]
Re: Quickest way of reading in large files? by RayRay459 (Pilgrim) on Aug 20, 2001 at 20:56 UTC
I think the while loop is quicker. It reads it without storing the whole file into memory. :) Cheers, Ray	[reply]