rootcho has asked for the wisdom of the Perl Monks concerning the following question:

hi, How do you parse very large text files.
Currently I'm trying to do this for a such file.
If I use :
open my $fh,... while (<$fh>) { ... }
this seems to try to read the whole file, cause the script got killed ..on my 2GB system
Using Tie::File seems a litle bit better, but it still get the whole RAM, even if i specify 2MB cache and disable deffered writes, and set read-only mode!!

Should I write my own file reader that reads up to "\n", character by character and then discards... or there is some module already that do that ?

Replies are listed 'Best First'.
Re: Parsing very big files GB
by FunkyMonk (Bishop) on Aug 30, 2007 at 23:08 UTC
    There's something else causing your problem, but you haven't shown us what it is. while (<$fh>) wil read the file line by line, and not all at once.

    Are you putting your read lines into an array or a hash?

      That was what I thought too ! To be sure I'm not leaking info..for all hashes, arrays and objects I do "undef var", after they are no longer needed.

        Show us more of the code. Sounds like some restructuring is in order. In particular, anything that you "undef" ought to be inside the while loop so that it is cleaned up when it goes out of scope at the end of the loop. Only variables that should retain content after the loop should be declared outside the loop.

        If you are not using strictures already I strongly recommend that you add use strict; use warnings; to your code.


        DWIM is Perl's answer to Gödel
Re: Parsing very big files GB
by GrandFather (Saint) on Aug 30, 2007 at 23:12 UTC

    How long are the lines in the file? Unless they are extremely long what you have shown should be fine. Are you sure you are not leaking memory in the loop or accumulating data in the loop? As a sanity check you might like to try:

    open my $fh,... while (<$fh>) { } close $fh;

    and check that that runs correctly. Assuming it does, start adding back the contents of the while loop and see where the problem happens. If it doesn't run correctly (terminate cleanly) you need to rethink what constitutes a line and possibly come back for more advice.


    DWIM is Perl's answer to Gödel

      ooo - long lines or maybe a weird record separator


      I humbly seek wisdom.
Re: Parsing very big files GB
by f00li5h (Chaplain) on Aug 31, 2007 at 01:34 UTC

    My guess is that you're doing something like

    sub is_useful; open $fh, '<', 'purr' or die "Sad kitty: $!"; my @useful_things = (); while(<$fh>){ push @useful_things if is_useful( $_ ); }

    So all that data is ending up stashed in memory, even though you're reading the file a line at a time.

    @_=qw; ask f00li5h to appear and remain for a moment of pretend better than a lifetime;;s;;@_[map hex,split'',B204316D8C2A4516DE];;y/05/os/&print;

Re: Parsing very big files GB
by cengineer (Pilgrim) on Aug 31, 2007 at 13:23 UTC
    Check out Tie::File .. From the description:
    The file is not loaded into memory, so this will work even for gigantic files.