Re: Perl Out of memory error
by BrowserUk (Patriarch) on Oct 05, 2015 at 19:08 UTC
|
Unless your machine has an unusually small amount of memory (<1GB) it is quite hard to see why you would be running out of memory on a 400MB file.
There are some anomolies in your code (you calculate $buf_size but never use it; you use & on sub calls which you shouldn't), but nothing that stands out as being the cause of the problem.
You are passing a string containing the whole file into to_ascii() and then copying it; and then passing the modified copy back and then copying it again; which can be alleviated by using references as below. Try that and see how you get on:
#!/bin/perl
# File: get-count.pl
#
# Perl script to locate the trailer record of a Fiserv data file and r
+eturn
# the following information
#
# Usage:
# perl get-count.pl INPUT_FILE
#
# Number of records per the trailer
# Number of records based on the file size and record size
# Record size.
sub to_ascii {
my( $s ) = @_;
$$s =~ tr[\x40\x5a\x7f\x7b\x5b\x6c\x50\x7d\x4d\x5d\x5c\x4e\x6b\x60
+\x4b\x61\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\x7a\x5e\x4c\x7e\x6e\
+x6f\x7c\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xd1\xd2\xd3\xd4\xd5\xd6\x
+d7\xd8\xd9\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xad\xe0\xbd\x5f\x6d\x79\x8
+1\x82\x83\x84\x85\x86\x87\x88\x89\x91\x92\x93\x94\x95\x96\x97\x98\x99
+\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xc0\x6a\xd0\xa1]
[\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\
+x2e\x2f\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x
+3f\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x5
+0\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f\x60\x61
+\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70\x71\x72\
+x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e];
return;
}
sub file_info {
my( $file ) = @_;
open( FP2, "<$file" ) || die return( -1, -1, -1 );
binmode( FP2 );
my( $file_size ) = -s $file;
my($s) = '';
seek( FP2, -$file_size, 2 );
read( FP2, $s, $file_size );
to_ascii( \$s );
if( $s =~ /999 (\d\d\d\d\d\d\d\d\d)( +$)/ ) {
my( $rec_size ) = 4 + length( $1 ) + length( $2 );
my( $rec_count ) = $1 + 0;
my( $rec_count_calc ) = ( $file_size / ($rec_size) ) - 1;
return( $rec_count, $rec_size, $rec_count_calc );
}
else {
return(-1, -1, -1);
}
close(FP);
}
foreach $file ( @ARGV ) {
my( $rec_count, $rec_size, $rec_count_calc ) = file_info( $file );
print "File: $file\nRecord Count: $rec_count\nComputed Record Coun
+t: $rec_count_calc\nRecord Size:$rec_size\n";
}
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
I made the change and I am still getting the "out or memory" error. I also fix the syntax errors.
| [reply] |
|
|
C:\test>perl -c 1143847.pl
1143847.pl syntax OK
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Re: Perl Out of memory error
by GotToBTru (Prior) on Oct 05, 2015 at 19:03 UTC
|
Not so much "COBOL" files as EBCDIC.
It reads the entire file in at once, which is probably why you are running out of memory.
| [reply] |
|
|
Can you please show how to rewrite the script so it does not read the entire file into memory. I am very new to Perl and I am still learning the basics.
| [reply] |
|
|
For some good reading, see buffered read. A simple while loop will read in 1 line at a time, each of which can be converted to ASCII and then tested.
As BrowserUK has pointed out, it's very possible this is not the cause of your error. But, on the assumption that maybe it is, we'll go on. The following statement is the problem:
read(FP2, $s, $file_size);
There is logic in the file to compute $buf_size but it is never used. Instead the entire file is loaded into the variable all at once. Then, you apply a regex against the entire file. This is very inefficient, even if it doesn't overwhelm memory. Is the trailer record the last one in the file? Use File::ReadBackwards to get that trailer record. The rest is just math.
| [reply] [d/l] |
Re: Perl Out of memory error
by locked_user sundialsvc4 (Abbot) on Oct 05, 2015 at 19:58 UTC
|
Since there are only three references to $buf_size, and indeed the read() function does read $file_size, I wonder if this script might contain a basic logic error. If, as the regex (that appears to be the only use of the file content ...) implies, the purpose of the script is to locate a certain sentinel string that begins with "999", it does not appear to me that it should be necessary to read the entire contents of the file in order to do that.
I’d suggest looking at one of those input files with a hex-editor to see if you can puzzle out how the file is built. The EBCDIC code (see, e.g. http://www.simotime.com/asc2ebc1.htm) is based more-or-less on punched cards, and so the characters and digits are in four discontiguous groups: $C1-C9, $D1-D9, $E1-E9, $F0-F9, the last group being the digits 0-9. So, the “eyecatcher” you are looking for should be very obvious in hex.
My feeling is that the program “is wrong,” even if “it works” right now ... the giveaway being that it fails for very large files when, intuitively, there is not much no reason why it should.
The entire business of seek()ing to a position near to end-of-file, and then reading a chunk, simply makes no sense with $file_size, but it makes much more sense with $bufsize. It is more-than-a-guess on my part that this was the designer’s intention ... especially if the COBOL records turn out to be (as I suspect they are ...) 6,000 bytes long, or some equal-sized division thereof. They wanted to read “the last records,” and knew that the files could be arbitrarily large.
| |
|
|
Is there a way to just read the last chuck of the file since it is the trailer record information that I am seeking? I am new to perl, so I have not idea how to proceed. I add "use strict; and "use warnings;" to fix the syntax error in the script.
| [reply] |
|
|
| [reply] |
|
|
|
|
Maybe you can ask one of your co-workers for assistance with this script? I say that, because this script probably has been around for a while and maybe people don’t realize that it contains an error. (See below.)
It seems to me that the seek() and read() calls should both probably refer to $buf_size rather than $file_size.
If you look at perldoc seek (click on the hyperlink ...), you will see that the existing call to this function does position “relative to end-of-file.” (That’s what the ,2) is for ...) Therefore, I think that the original intent was to slurp the last 6,000 bytes (or less, if the file was shorter). Which would have been sufficient for this script’s purposes. What it is doing now is reading the entire file. And, I think, it was never intended to do that. (But the change, whenever it occurred, is now lost in the mists of time ...)
Since this is an existing script, I think it makes sense at this point to ask a co-worker, your boss, etc. to “hey, have a look at this.” The fix is easy. But, the nature of this bug ... its presence here ... is “odd,” hence worthy of higher-up attentions. The bigger-picture question before the house (but not necessarily for you) is: how and when did this script get to be this way?
| |
|
|