Out of Memory - Line of TXT too large

MajinMalak has asked for the wisdom of the Perl Monks concerning the following question:

I am having an issue where I'm iterating through a text file, but run out of memory when I read in some of the lines. I've had text files that contain 900 MB worth of data on one line. Is there a way to only read a partial line from a text file maybe the first 20 characters, or a way to skip a line if it's too large to read?

I wrote another quick script that write outs a modified text file containing only the lines I want (which are not too big to read), which still runs into the out of memory issue.

Here is the code I wrote for making the modified text file. Hopefully the logic to either skipping the large lines or only reading in the first few characters can be applied to my larger script.

open (LOG, $file);
open (OUT, ">", $outf);

my $header = <LOG>;
print OUT $header;

while (<LOG>) {
    print OUT $_ if ($_ =~ /^\{\|\d{4}-\d{2}-\d{2}_\d{2}.\d{2}.\d{2}\|
+LOGIN/);
}
close LOG;
close OUT;
[download]

Comment on Out of Memory - Line of TXT too large Download Code

Replies are listed 'Best First'.
Re: Out of Memory - Line of TXT too large by Corion (Patriarch) on Jan 02, 2014 at 12:43 UTC
Depending on your needs, use either read to read a given number of bytes from a filehandle, or set `$/` to the number of bytes you want to process for each call to readline resp. `<>`. That way, you don't need to read the whole line to operate on it.	[reply] [d/l] [select]
Re^2: Out of Memory - Line of TXT too large by MajinMalak (Initiate) on Jan 02, 2014 at 12:51 UTC
Sorry I'm newish to Perl so I apologize if I'm asking simple questions. So all I would need to do is add $/ = 30; to my code and it should only read in the first 30 characters? So something like: `open (LOG, $file); open (OUT, ">", $outf); $/ = 50; my $header = <LOG>; print OUT $header; while (<LOG>) { print OUT $_ if ($_ =~ /^\{\\|\d{4}-\d{2}-\d{2}_\d{2}.\d{2}.\d{2}\\| +LOGIN/); } close LOG; close OUT;` [download]	[reply] [d/l]
Re^3: Out of Memory - Line of TXT too large by Corion (Patriarch) on Jan 02, 2014 at 13:09 UTC
Sorry - I was unclear in my first reply. You need to set `$/` to a reference to the number of bytes: `$/ = \50;` [download] This will make each "line" exactly 50 bytes long, except for the last "line". Also see perlvar on `$/`.	[reply] [d/l] [select]
Re^4: Out of Memory - Line of TXT too large by MajinMalak (Initiate) on Jan 02, 2014 at 13:57 UTC
Re^5: Out of Memory - Line of TXT too large by roboticus (Chancellor) on Jan 02, 2014 at 14:04 UTC
Re^5: Out of Memory - Line of TXT too large by Corion (Patriarch) on Jan 02, 2014 at 14:00 UTC
Re^5: Out of Memory - Line of TXT too large by Anonymous Monk on Jan 02, 2014 at 15:13 UTC
Re: Out of Memory - Line of TXT too large (mmap) by oiskuu (Hermit) on Jan 02, 2014 at 15:45 UTC
Scan a very big file, working on some portions? This a good opportunity for mmap. For example: `use File::Map ':all'; my $file = shift; my $SEP = "\n"; # line/chunk separator my ($s, $e); open(my $fh, '<', $file) \|\| die "$!"; map_handle(my $mmap, $fh, '<'); sub process {} for ($e = 0;; $e += length($SEP)) { ($s, $e) = ($e, index($mmap, $SEP, $e)); print "fragment [$s, $e)\n"; process($e < 0 ? substr($mmap, $s) : substr($mmap, $s, $e-$s)); last if $e < 0; }` [download]	[reply] [d/l]