Mac has asked for the wisdom of the Perl Monks concerning the following question:

Hi all.

I have a question, which is quicker when reading large files line by line... this one:

#!/usr/local/bin/perl open (IN, "/text.txt"); while (<IN>){ if ($_ =~ /hello/){ # do what ever } } close IN;
---OR---
#!/usr/local/bin/perl open (IN, "/text.txt"); @lines = (IN); close (IN); foreach (@lines){ if ($_ =~ /hello world/){ # do what ever } }

Just wondering if the file size is > 100meg and was entirely text

Edit Masem 2001-08-20 - Edit title from "which is quicker?"

Replies are listed 'Best First'.
Re: Quickest way of reading in large files (while v. for)?
by CheeseLord (Deacon) on Aug 20, 2001 at 08:33 UTC

    I'd have to say the while version's quicker, simply because it's not going to hog tons of memory storing the the entire file in an array as the foreach version will. In fact, just the other day, somebody posted some code that addressed this problem - after changing to a while loop, execution time was cut by over 40%.

    From the "I forgot to mention this" dept.: If your file is over 100 meg, not only will foreach be a lot slower, it may not even finish running, due to the memory issue I describe above. I highly recommend using a while loop for a file that large.

    [Update: Changed title to match root node]

    His Royal Cheeziness

Re: which is quicker?
by maverick (Curate) on Aug 20, 2001 at 08:33 UTC
    Check out the Benchmark module (perldoc Benchmark). Here's what it says:
    #!/usr/bin/perl use strict; use Benchmark; sub line_by_line { open (IN, "/usr/share/dict/words"); while (<IN>){ if ($_ =~ /hello world/){ # do what ever } } close IN; } sub block { open (IN, "/usr/share/dict/words"); my @lines = <IN>; close (IN); foreach (@lines){ if ($_ =~ /hello world/){ # do what ever } } } timethese(100,{ 'line' => \&line_by_line, 'block' => \&block });
    Which Yields
    Benchmark: timing 100 iterations of block, line... block: 32 wallclock secs (32.52 usr + 0.36 sys = 32.88 CPU) @ 3 +.04/s (n=100) line: 19 wallclock secs (17.92 usr + 0.16 sys = 18.08 CPU) @ 5 +.53/s (n=100)

    /\/\averick
    perl -l -e "eval pack('h*','072796e6470272f2c5f2c5166756279636b672');"

    Updated:Fixed regexp to be the same in both subs. Used question's code verbatim. Thanks lemming.

Re: Quickest way of reading in large files?
by stefp (Vicar) on Aug 20, 2001 at 23:26 UTC
    Besides the cost of holding everything at once in memory, your code incurs the cost of unedeed strings copy. There is no such things as "copy on write strings" where the strings value would be shared until modified. BTW, It is not clear that the overhead of implementing "copy on write strings" in Perl will be less than the ones imposed by the undeeded allocation and copy of strings.

    -- stefp

Re: Quickest way of reading in large files?
by RayRay459 (Pilgrim) on Aug 20, 2001 at 20:56 UTC
    I think the while loop is quicker. It reads it without storing the whole file into memory. :) Cheers, Ray