Perl Out of memory error

vdel2393 has asked for the wisdom of the Perl Monks concerning the following question:

I inherited a perl script that convert Cobol files into ascii and finds the trailer record information. The script is giving me "out of memory" errors for large files, ~ 400MB and greater.

perl and OS version:

Perl version: /usr/bin/perl -> /usr/opt/perl5/bin/perl5.8.8
Perl maxdata=0x80000000
OS: AIX 6100-09-04-1441
All user ulimits =unlimited

Here is the script:
#!/bin/perl
# File: get-count.pl
#
# Perl script to locate the trailer record of a Fiserv data file and r
+eturn
# the following information
#
# Usage: 
#    perl get-count.pl INPUT_FILE
# 
# Number of records per the trailer
# Number of records based on the file size and record size
# Record size.

sub to_ascii
{
        my($s) = @_;

        $s =~ tr/\x40\x5a\x7f\x7b\x5b\x6c\x50\x7d\x4d\x5d\x5c\x4e\x6b\
+x60\x4b\x61\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\x7a\x5e\x4c\x7e\x
+6e\x6f\x7c\xc1
\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\x
+e2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xad\xe0\xbd\x5f\x6d\x79\x81\x82\x83\x8
+4\x85\x86\x87\
x88\x89\x91\x92\x93\x94\x95\x96\x97\x98\x99\xa2\xa3\xa4\xa5\xa6\xa7\xa
+8\xa9\xc0\x6a\xd0\xa1/\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2
+b\x2c\x2d\x2e\
x2f\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f\x4
+0\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50\x51
+\x52\x53\x54\x
55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f\x60\x61\x62\x63\x64\x65\x66
+\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70\x71\x72\x73\x74\x75\x76\x77\
+x78\x79\x7a\x7
b\x7c\x7d\x7e/;
   return($s);
}

sub file_info
{
    my($file) = @_;

    open(FP2, "<$file") || die return(-1,-1, -1);
    binmode(FP2);

    my($file_size) = -s $file;
    my($buf_size) = 6000;
    if($buf_size > $file_size)
    {
        $buf_size = $file_size;
    }

    my($s) = '';

    seek(FP2, -$file_size, 2);
    read(FP2, $s, $file_size);
    my($s2) = &to_ascii($s);
    if($s2 =~ /999 (\d\d\d\d\d\d\d\d\d)( +$)/)
    {
        my($rec_size) = 4 + length($1) + length($2);
        my($rec_count) = $1 + 0;
        my($rec_count_calc) = ($file_size / ($rec_size)) - 1;
        return($rec_count, $rec_size, $rec_count_calc);
    }
    else
    {
        return(-1, -1, -1);
    }
    close(FP);
}

foreach $file (@ARGV)
{

    my($rec_count, $rec_size, $rec_count_calc) = &file_info($file);

    print "File: $file\nRecord Count: $rec_count\nComputed Record Coun
+t: $rec_count_calc\nRecord Size:$rec_size\n";
}

Here is more information about the perl compiled:
perl -V
Summary of my perl5 (revision 5 version 8 subversion 8)

  Built under aix
  Compiled at May 19 2013 14:46:07
  @INC:
    /usr/opt/perl5/lib/5.8.8/aix-thread-multi
    /usr/opt/perl5/lib/5.8.8
    /usr/opt/perl5/lib/site_perl/5.8.8/aix-thread-multi
    /usr/opt/perl5/lib/site_perl/5.8.8
    /usr/opt/perl5/lib/site_perl
[download]

Comment on Perl Out of memory error Download Code

Replies are listed 'Best First'.
Re: Perl Out of memory error by BrowserUk (Patriarch) on Oct 05, 2015 at 19:08 UTC
Unless your machine has an unusually small amount of memory (<1GB) it is quite hard to see why you would be running out of memory on a 400MB file. There are some anomolies in your code (you calculate $buf_size but never use it; you use & on sub calls which you shouldn't), but nothing that stands out as being the cause of the problem. You are passing a string containing the whole file into to_ascii() and then copying it; and then passing the modified copy back and then copying it again; which can be alleviated by using references as below. Try that and see how you get on: #!/bin/perl # File: get-count.pl # # Perl script to locate the trailer record of a Fiserv data file and r +eturn # the following information # # Usage: # perl get-count.pl INPUT_FILE # # Number of records per the trailer # Number of records based on the file size and record size # Record size. sub to_ascii { my( $s ) = @_; $$s =~ tr[\x40\x5a\x7f\x7b\x5b\x6c\x50\x7d\x4d\x5d\x5c\x4e\x6b\x60 +\x4b\x61\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\x7a\x5e\x4c\x7e\x6e\ +x6f\x7c\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xd1\xd2\xd3\xd4\xd5\xd6\x +d7\xd8\xd9\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xad\xe0\xbd\x5f\x6d\x79\x8 +1\x82\x83\x84\x85\x86\x87\x88\x89\x91\x92\x93\x94\x95\x96\x97\x98\x99 +\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xc0\x6a\xd0\xa1] [\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\ +x2e\x2f\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x +3f\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x5 +0\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f\x60\x61 +\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70\x71\x72\ +x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e]; return; } sub file_info { my( $file ) = @_; open( FP2, "<$file" ) \|\| die return( -1, -1, -1 ); binmode( FP2 ); my( $file_size ) = -s $file; my($s) = ''; seek( FP2, -$file_size, 2 ); read( FP2, $s, $file_size ); to_ascii( \$s ); if( $s =~ /999 (\d\d\d\d\d\d\d\d\d)( +$)/ ) { my( $rec_size ) = 4 + length( $1 ) + length( $2 ); my( $rec_count ) = $1 + 0; my( $rec_count_calc ) = ( $file_size / ($rec_size) ) - 1; return( $rec_count, $rec_size, $rec_count_calc ); } else { return(-1, -1, -1); } close(FP); } foreach $file ( @ARGV ) { my( $rec_count, $rec_size, $rec_count_calc ) = file_info( $file ); print "File: $file\nRecord Count: $rec_count\nComputed Record Coun +t: $rec_count_calc\nRecord Size:$rec_size\n"; } [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^2: Perl Out of memory error by Anonymous Monk on Oct 05, 2015 at 21:37 UTC
I made the change and I am still getting the "out or memory" error. I also fix the syntax errors.	[reply]
Re^3: Perl Out of memory error by BrowserUk (Patriarch) on Oct 05, 2015 at 22:20 UTC
I also fix the syntax errors. Hm. I guess you downloaded it incorrectly: `C:\test>perl -c 1143847.pl 1143847.pl syntax OK` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: Perl Out of memory error by GotToBTru (Prior) on Oct 05, 2015 at 19:03 UTC
Not so much "COBOL" files as EBCDIC. It reads the entire file in at once, which is probably why you are running out of memory. Dum Spiro Spero	[reply]
Re^2: Perl Out of memory error by Anonymous Monk on Oct 05, 2015 at 21:51 UTC
Can you please show how to rewrite the script so it does not read the entire file into memory. I am very new to Perl and I am still learning the basics.	[reply]
Re^3: Perl Out of memory error by GotToBTru (Prior) on Oct 06, 2015 at 12:48 UTC
For some good reading, see buffered read. A simple while loop will read in 1 line at a time, each of which can be converted to ASCII and then tested. As BrowserUK has pointed out, it's very possible this is not the cause of your error. But, on the assumption that maybe it is, we'll go on. The following statement is the problem: `read(FP2, $s, $file_size);` There is logic in the file to compute $buf_size but it is never used. Instead the entire file is loaded into the variable all at once. Then, you apply a regex against the entire file. This is very inefficient, even if it doesn't overwhelm memory. Is the trailer record the last one in the file? Use File::ReadBackwards to get that trailer record. The rest is just math. Dum Spiro Spero	[reply] [d/l]
Re: Perl Out of memory error by locked_user sundialsvc4 (Abbot) on Oct 05, 2015 at 19:58 UTC
Since there are only three references to `$buf_size`, and indeed the read() function does read `$file_size`, I wonder if this script might contain a basic logic error. If, as the regex (that appears to be the only use of the file content ...) implies, the purpose of the script is to locate a certain sentinel string that begins with `"999"`, it does not appear to me that it should be necessary to read the entire contents of the file in order to do that. I’d suggest looking at one of those input files with a hex-editor to see if you can puzzle out how the file is built. The EBCDIC code (see, e.g. http://www.simotime.com/asc2ebc1.htm) is based more-or-less on punched cards, and so the characters and digits are in four discontiguous groups: `$C1-C9, $D1-D9, $E1-E9, $F0-F9`, the last group being the digits 0-9. So, the “eyecatcher” you are looking for should be very obvious in hex. My feeling is that the program “is wrong,” even if “it works” right now ... the giveaway being that it fails for very large files when, intuitively, there is ~~not much~~ no reason why it should. The entire business of seek()ing to a position near to end-of-file, and then reading a chunk, simply makes no sense with `$file_size`, but it makes much more sense with `$bufsize`. It is more-than-a-guess on my part that this was the designer’s intention ... especially if the COBOL records turn out to be (as I suspect they are ...) 6,000 bytes long, or some equal-sized division thereof. They wanted to read “the last records,” and knew that the files could be arbitrarily large.
Re^2: Perl Out of memory error by Anonymous Monk on Oct 05, 2015 at 21:48 UTC
Is there a way to just read the last chuck of the file since it is the trailer record information that I am seeking? I am new to perl, so I have not idea how to proceed. I add "use strict; and "use warnings;" to fix the syntax error in the script.	[reply]
Re^3: Perl Out of memory error by Anonymous Monk on Oct 05, 2015 at 22:15 UTC
See seak and File::ReadBackwards and perlintro	[reply]
Re^4: Perl Out of memory error by Anonymous Monk on Oct 05, 2015 at 22:16 UTC
Re^3: Perl Out of memory error by locked_user sundialsvc4 (Abbot) on Oct 06, 2015 at 02:01 UTC
Maybe you can ask one of your co-workers for assistance with this script? I say that, because this script probably has been around for a while and maybe people don’t realize that it contains an error. (See below.) It seems to me that the `seek()` and `read()` calls should both probably refer to `$buf_size` rather than `$file_size`. If you look at `perldoc seek` (click on the hyperlink ...), you will see that the existing call to this function does position “relative to end-of-file.” (That’s what the `,2)` is for ...) Therefore, I think that the original intent was to slurp the last 6,000 bytes (or less, if the file was shorter). Which would have been sufficient for this script’s purposes. What it is doing now is reading the entire file. And, I think, it was never intended to do that. (But the change, whenever it occurred, is now lost in the mists of time ...) Since this is an existing script, I think it makes sense at this point to ask a co-worker, your boss, etc. to “hey, have a look at this.” The fix is easy. But, the nature of this bug ... its presence here ... is “odd,” hence worthy of higher-up attentions. The bigger-picture question before the house (but not necessarily for you) is: how and when did this script get to be this way?
Re^4: Perl Out of memory error by Anonymous Monk on Oct 06, 2015 at 17:56 UTC