MajinMalak has asked for the wisdom of the Perl Monks concerning the following question:

I am having an issue where I'm iterating through a text file, but run out of memory when I read in some of the lines. I've had text files that contain 900 MB worth of data on one line. Is there a way to only read a partial line from a text file maybe the first 20 characters, or a way to skip a line if it's too large to read?

I wrote another quick script that write outs a modified text file containing only the lines I want (which are not too big to read), which still runs into the out of memory issue.

Here is the code I wrote for making the modified text file. Hopefully the logic to either skipping the large lines or only reading in the first few characters can be applied to my larger script.

open (LOG, $file); open (OUT, ">", $outf); my $header = <LOG>; print OUT $header; while (<LOG>) { print OUT $_ if ($_ =~ /^\{\|\d{4}-\d{2}-\d{2}_\d{2}.\d{2}.\d{2}\| +LOGIN/); } close LOG; close OUT;

Replies are listed 'Best First'.
Re: Out of Memory - Line of TXT too large
by Corion (Patriarch) on Jan 02, 2014 at 12:43 UTC

    Depending on your needs, use either read to read a given number of bytes from a filehandle, or set $/ to the number of bytes you want to process for each call to readline resp. <>. That way, you don't need to read the whole line to operate on it.

      Sorry I'm newish to Perl so I apologize if I'm asking simple questions.

      So all I would need to do is add $/ = 30; to my code and it should only read in the first 30 characters? So something like:

      open (LOG, $file); open (OUT, ">", $outf); $/ = 50; my $header = <LOG>; print OUT $header; while (<LOG>) { print OUT $_ if ($_ =~ /^\{\|\d{4}-\d{2}-\d{2}_\d{2}.\d{2}.\d{2}\| +LOGIN/); } close LOG; close OUT;

        Sorry - I was unclear in my first reply. You need to set $/ to a reference to the number of bytes:

        $/ = \50;

        This will make each "line" exactly 50 bytes long, except for the last "line". Also see perlvar on $/.

Re: Out of Memory - Line of TXT too large (mmap)
by oiskuu (Hermit) on Jan 02, 2014 at 15:45 UTC
    Scan a very big file, working on some portions? This a good opportunity for mmap. For example:
    use File::Map ':all'; my $file = shift; my $SEP = "\n"; # line/chunk separator my ($s, $e); open(my $fh, '<', $file) || die "$!"; map_handle(my $mmap, $fh, '<'); sub process {} for ($e = 0;; $e += length($SEP)) { ($s, $e) = ($e, index($mmap, $SEP, $e)); print "fragment [$s, $e)\n"; process($e < 0 ? substr($mmap, $s) : substr($mmap, $s, $e-$s)); last if $e < 0; }