Out of memory!!??

dramguy has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I have a huge file (4.2GB) which contains entries similar to this:

Length_meas_C1C2 0.22
0.00 00.000 .090
Length_meas_C1C2 0.18
0.00 00.000 .090
0.00 00.000 .090
Length_meas_C1C2 0.18
Length_meas_C1C2 0.18
0.00 00.000 .090
Length_meas_C1C2 0.18

I am trying to parse this file and count the number of
times a certain layer number is found. Here is the
code I have...this works fine for smaller files, however
I am getting the out of memory error for the actual
data files. Can anyone offer any suggestions on how to
avoid this issue? Thanks in advance!


#!/opt/perl/5.8.7-32bit/bin/perl

my $inFile = $ARGV[0];
my %CALHASH;
my @CALARR;
if(!$ARGV[0]) {exit;}

open(FH, "<$inFile");
foreach $line (<FH>) {
  chomp($line); #remove newline from end of line
  if($line =~ /(\w+_meas_.*)/) {
  ##($layer, $enc) = split(' ', $1); 
  $layer =  $1; 
  #print "$layer, $enc\n"; 
  if (exists $CALHASH{$layer}) {
      $CALHASH{$layer}{'freq'}++;
  } else {
      $CALHASH{$layer}{'freq'} = 1;
      
  } 
  }
}

foreach $key (keys %CALHASH) {
  print "$CALHASH{$key}{'freq'} $key\n";
}
[download]

Comment on Out of memory!!?? Download Code

Replies are listed 'Best First'.
Re: Out of memory!!?? by naikonta (Curate) on May 25, 2007 at 15:12 UTC
foreach $line (<FH>) { Basically, this code tries to read all file content at once. That's what `<>` does when evaluated in list context. Try to read the file line by line. Saying, `while (<FH>) {` [download] is the same as `while (defined($_ = <FH>)) {` [download] In code above, `<FH>` is evaluated in scalar context and the `<>` operator will return line by line until it reaches end of file. See perlop for more detail. Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!	[reply] [d/l] [select]
Re^2: Out of memory!!?? by dramguy (Novice) on May 25, 2007 at 17:22 UTC
Thanks!! This did the trick. I'd still like to play around with the DBI package to see if speed is improved. Thanks again.	[reply]
Re: Out of memory!!?? by salva (Canon) on May 25, 2007 at 15:25 UTC
You are reading the full file in memory before processing it. Use `while` to loop instead of `foreach`. Also, you are using a hash of hashes to store the counters when a simple hash will do and reduce the memory consumption an order of magnitude: `@ARGV == 1 or die "Usage: ..."; my $inFile = $ARGV[0]; my %freq; open(FH, "<$inFile"); while (<FH>) { chomp; if(/(\w+_meas_.*)/) { ##($layer, $enc) = split(' ', $1); $layer = $1; #print "$layer, $enc\n"; $freq{$layer}++; } } foreach $key (keys %freq) { print "$freq{$key} $key\n"; }` [download]	[reply] [d/l] [select]
Re: Out of memory!!?? by derby (Abbot) on May 25, 2007 at 15:01 UTC
Add more memory to the machine or use something like DB_File. -derby	[reply]
Re: Out of memory!!?? by jettero (Monsignor) on May 25, 2007 at 15:02 UTC
You can combine Storable with DB_File to create a memory-low but disk using solution. You only need something like Storable because your hash is multi-level. If you can avoid that then all you need is DB_File. There are suites that store deep structures automatically, but they don't come with perl — which is sometimes an issue on platforms where perl is installed under /opt/. Otherwise, have a gander at DBM::Deep. -Paul	[reply]
Re: Out of memory!!?? by zentara (Cardinal) on May 25, 2007 at 15:07 UTC
It seems to me your script should work, since you are reading it line-by-line. My guess is your Perl or OS dosn't have "large-file-support". See Large File Support Do a "perl -V" and see if you can find the phrase "USE_LARGE_FILES". I'm not really a human, but I play one on earth. Cogito ergo sum a bum	[reply]
Re^2: Out of memory!!?? by blazar (Canon) on May 25, 2007 at 15:34 UTC
It seems to me your script should work, since you are reading it line-by-line. My guess is your Perl or OS dosn't have "large-file-support". But is it possible that no one thus far has noticed the `foreach $line (<FH>) {` [download] line in the OP's code?!? Well, maybe the problem will still be there, but there's a reason why we recommend against doing so all the time. To the OP: just try using a `while` loop instead. Until you have a fully functional Perl 6 installation available, that is! Update: naikonta noticed.	[reply] [d/l] [select]
Re: Out of memory!!?? by cengineer (Pilgrim) on May 25, 2007 at 16:23 UTC
You could also try Tie::File	[reply]
Re: Out of memory!!?? by moritz (Cardinal) on May 27, 2007 at 08:15 UTC
This regex: `/(\w+_meas_.*)/` might not be the best choice if the file is not very uniform, and there are long lines that match the pattern. If you think you know that anything behind the 'meas_' is at most 100 chars, you can use `/(\w+_meas_.{0,100})/` If you have some very long lines, that prevents them from being all stored in the hash. This is not your main problem, but might be a precaution anyway. As a side not, the `if (exists ...` code is superfluous, you can just as well increment `$CALHASH{$layer}{'freq'}` if the entry exists or not. Perl 6 in German	[reply] [d/l] [select]