isabella423 has asked for the wisdom of the Perl Monks concerning the following question:

I would like to put a large text file into an array and then access elements close to the end of the array. When I use: @myfile=<FILE>; I get an "Out of memory!" error message. Tie::File successfully ties the file, I believe, however the code dies and gives the same memory error as above when I try and access the last element or have its size returned. Note that it does not die when I access, say, the first element. example, if @myfile is tied file array: print $myfile[0]; #this prints as expected! $size=@myfile; #this dies and gives "out of memory" error Any suggestions?
  • Comment on Large text files into arrays, accessing final elements

Replies are listed 'Best First'.
Re: Large text files into arrays, accessing final elements
by jwkrahn (Abbot) on Feb 07, 2010 at 01:29 UTC
Re: Large text files into arrays, accessing final elements
by desemondo (Hermit) on Feb 07, 2010 at 09:57 UTC
    Why do you want to have the entire file in an array? In almost all cases it is far better to process your file line by line.
    eg.
    while (my $line = <$fh>){ if (#line matches what you're after...){ # Do action on it here... } }
    When dealing with large files, it is often helpful to 'rewind' your filehandles if you are checking the last couple of lines for something and then need to go back a bit or even back to the very beginning. If thats what you're after, check out tell and seek

    Update:
    Since Tie::File does not load the entire file into memory, and you are still getting an out of memory error, maybe you can experiment with Tie::File's memory settings.
Re: Large text files into arrays, accessing final elements
by blakew (Monk) on Feb 07, 2010 at 16:58 UTC
    isabella423, this smells like an XY Problem. I would try an alternate solution, like (already suggested) using a while loop to read in lines one at a time, and pushing into your array only those lines you actually care about. You could also try using the tail program (available on Windows with cygwin).
Re: Large text files into arrays, accessing final elements
by johngg (Canon) on Feb 07, 2010 at 23:50 UTC

    A different approach would be to use seek to position the file pointer some number of bytes from the end of your file, 10,000 say, and then use read to read those last 10,000 bytes of the file into buffer held in a scalar variable. You could then open another filehandle on a reference to that scalar which would allow you to read the last 100-200 lines of your file (depending on line lengths) into an array without having to read the whole file. Something like (not tested):

    open my $bigFH, q{<}, q{myBigFile} or die $!; seek $bigFH, -10000, 2; read $bigFH, my $last10k, 10000; open my $last10kFH, q{<}, \ $last10k or die $!; my @lastLines = <$last10kFH>;

    Be aware the first line you read will most likely be a partial line and this approach will not help if you need to know specific line numbers; for that the while loop would be required.

    I hope this is helpful.

    Cheers,

    JohnGG

Re: Large text files into arrays, accessing final elements
by Not_a_Number (Prior) on Feb 07, 2010 at 19:09 UTC
    if @myfile is tied file array: (...) $size=@myfile; #this dies and gives "out of memory" error

    This issue seems to be directly addressed in the docs to Tie::File (in the caveats):

    Note that accessing the length of the array via $x = scalar @tied_file accesses all records and stores their offsets.

    Did you try, for example:

    print $myfile[-1];
      Did you try, for example: print $myfile[-1];

      That will suffer exactly the same problems. There is simply no way to know how many lines a file (with variable length lines) contains without reading the whole file.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.