in reply to Perl Out of memory error

Since there are only three references to $buf_size, and indeed the read() function does read $file_size, I wonder if this script might contain a basic logic error.   If, as the regex (that appears to be the only use of the file content ...) implies, the purpose of the script is to locate a certain sentinel string that begins with "999", it does not appear to me that it should be necessary to read the entire contents of the file in order to do that.

I’d suggest looking at one of those input files with a hex-editor to see if you can puzzle out how the file is built.   The EBCDIC code (see, e.g. http://www.simotime.com/asc2ebc1.htm) is based more-or-less on punched cards, and so the characters and digits are in four discontiguous groups: $C1-C9, $D1-D9, $E1-E9, $F0-F9, the last group being the digits 0-9.   So, the “eyecatcher” you are looking for should be very obvious in hex.

My feeling is that the program “is wrong,” even if “it works” right now ... the giveaway being that it fails for very large files when, intuitively, there is not much no reason why it should.

The entire business of seek()ing to a position near to end-of-file, and then reading a chunk, simply makes no sense with $file_size, but it makes much more sense with $bufsize.   It is more-than-a-guess on my part that this was the designer’s intention ... especially if the COBOL records turn out to be (as I suspect they are ...) 6,000 bytes long, or some equal-sized division thereof.   They wanted to read “the last records,” and knew that the files could be arbitrarily large.

Replies are listed 'Best First'.
Re^2: Perl Out of memory error
by Anonymous Monk on Oct 05, 2015 at 21:48 UTC
    Is there a way to just read the last chuck of the file since it is the trailer record information that I am seeking? I am new to perl, so I have not idea how to proceed. I add "use strict; and "use warnings;" to fix the syntax error in the script.

      Maybe you can ask one of your co-workers for assistance with this script?   I say that, because this script probably has been around for a while and maybe people don’t realize that it contains an error.   (See below.)

      It seems to me that the seek() and read() calls should both probably refer to $buf_size rather than $file_size.

      If you look at perldoc seek (click on the hyperlink ...), you will see that the existing call to this function does position “relative to end-of-file.”   (That’s what the ,2) is for ...)   Therefore, I think that the original intent was to slurp the last 6,000 bytes (or less, if the file was shorter).   Which would have been sufficient for this script’s purposes.   What it is doing now is reading the entire file.   And, I think, it was never intended to do that.   (But the change, whenever it occurred, is now lost in the mists of time ...)

      Since this is an existing script, I think it makes sense at this point to ask a co-worker, your boss, etc. to “hey, have a look at this.”   The fix is easy.   But, the nature of this bug ... its presence here ... is “odd,” hence worthy of higher-up attentions.   The bigger-picture question before the house (but not necessarily for you) is:   how and when did this script get to be this way?

        Thank you for your help. I replaced the $buf_size with the $file_size and it started working.
        But still, there is a better way to do this without first reading the entire file. I can still see possible memory issues if the files are really big. If the trailer record is the last one in the file, I could use seek() and File::ReadBackwards to get that trailer record. But I do not know how to do that yet, so I’m still learning about that function.