Looping through a binary file

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm new to Perl and running into difficulty with a Perl routine I've written. Basically, I have an EBCDIC file without carriage returns that is quite large. The routine I've written functions fine until I throw my "while" statement in at which point I get an "out of memory" error. Help pls! Thanks in advance..Chris

#!/usr/bin/perl
use strict;
use warnings;

use lib '/home/q37j6m4/lib/';
# proceed as usual
use Convert::IBM390 qw(:all);
set_codepage('CP00037');

print "script to import pscmt.txt\n";

my $fileIN = "/sas/rbcpcins/ha/rawdata/pscmt.txt";
my $fileOUT = "/home/q37j6m4/test_out.txt";
my $fileLOG = "/home/q37j6m4/chris_log.txt";

my $recln =98; #length of file being imported

open(perlIN,"<",$fileIN) or die "Can't open input.txt: $!";
open(perlOUT, ">",$fileOUT) or die "Can't open output.txt: $!";
open(perlLOG, ">>",$fileLOG) or die "Can't open my.log: $!";


while(<perlIN>)
{

print $index,"\n";
# pre-define fields
my $cltno2   ="";
my $cmtseq   ="";
my $npseqn   ="";
my $comm     ="";
my $policy   ="";
my $pdcode   ="";
my $userid   ="";
my $reccdt   ="";
my $recctm   ="";
my $cmgrcd   ="";
my $actcod   ="";
my $upid     ="";

# read fields
read(perlIN,$cltno2,5);
read(perlIN,$cmtseq,5);
read(perlIN,$npseqn,3);
read(perlIN,$comm,45);
read(perlIN,$policy,8);
read(perlIN,$pdcode,3);
read(perlIN,$userid,10);
read(perlIN,$reccdt,5);
read(perlIN,$recctm,6);
read(perlIN,$cmgrcd,1);
read(perlIN,$actcod,3);
read(perlIN,$upid,4);

# convert fields
my $cltno2_c = unpackeb('p5',$cltno2);
my $cmtseq_c = unpackeb('p5',$cmtseq);
my $npseqn_c = unpackeb('p3',$npseqn);
my $comm_c = eb2asc($comm);
my $policy_c = eb2asc($policy);
my $pdcode_c = eb2asc($pdcode);
my $userid_c = eb2asc($userid);
my $reccdt_c = unpackeb('p5',$reccdt);
my $recctm_c = eb2asc($recctm);
my $cmgrcd_c = eb2asc($cmgrcd);
my $actcod_c = eb2asc($actcod);
my $upid_c = unpackeb('p4',$upid);

# write fields
print perlOUT sprintf ("%12.0f", $cltno2_c);
print perlOUT sprintf ("%12.0f", $cmtseq_c);
print perlOUT sprintf ("%12.0f", $npseqn_c);
print perlOUT $comm_c;
print perlOUT $policy_c;
print perlOUT $pdcode_c;
print perlOUT $userid_c;
print perlOUT sprintf ("%12.0f", $reccdt_c);
print perlOUT sprintf ("%12.0f", $recctm_c);
print perlOUT $cmgrcd_c;
print perlOUT $actcod_c;
print perlOUT sprintf ("%12.0f", $upid_c);
print perlOUT "\r\n";

} # end while loop

close perlIN;
close perlOUT;
close perlLOG;
[download]

Comment on Looping through a binary file Download Code

Replies are listed 'Best First'.
Re: Looping through a binary file by runrig (Abbot) on Sep 06, 2013 at 17:04 UTC
Your 'while' reads in a line of the file, which, being a binary file, is not what you want to do. You can redefine what a 'line' is by setting `$/` (see perlvar). If you set it to a reference to an integer (e.g. `$/ = \42;`), it will read that many bytes. You can set it to the length of your fixed length record. But then you will have to adjust how you set your variables in the following lines (by using substr or unpack or Parse::FixedLength or something) to 'read' from `$_` instead of reading from the filehandle again.	[reply] [d/l] [select]
Re: Looping through a binary file by Laurent_R (Canon) on Sep 06, 2013 at 17:46 UTC
Or use rather the read function which will read the number of bytes you specify. For example: `open my $fh, "<", $infile or die cannot open $infile $!";; read $fh, $out, 100; # reads 100 bytes from the file` [download] You can also use the seek function to move down the file.	[reply] [d/l]
Re: Looping through a binary file by marinersk (Priest) on Sep 06, 2013 at 23:06 UTC
Suspect your problem is that `while(<perlIN>)` is causing it to read the whole binary file, all at once, right there. Boom -- out of memory. Instead, as noted by others, read only what you need -- one record at a time. A common approach is to use a flag variable to indicate when you reach end of file; control the loop based on that. Two ways to do this: Read one record at the top of the loop (use `read` and your `$recln` value); use `substr` to extract the pieces, or; Don't read anything at the top of the loop, but instead, read each variable like you do currently do; modify each one to check for end of file (sample below). `my $TRUE = 1; my $FALSE = 0; [...] my $EofFlag = $FALSE; while(!$EofFlag) { [...] my $cmtseq_cnt = read(perlIN,$cmtseq,5); if ($cmtseq_cnt < 5) { $EofFlag = $TRUE; # Redundant, given what we do next, but h +ere for example last; } }` [download]	[reply] [d/l]
Re: Looping through a binary file by daxim (Curate) on Sep 06, 2013 at 17:07 UTC
There's a stray $index variable on line 25, but the program basically works for me. To reproduce your error, also provide the input data.	[reply]
Re: Looping through a binary file by Marshall (Canon) on Sep 07, 2013 at 10:26 UTC
I don't work with EBCDIC now, but if you want to, I would suggest that you examine and read: http://perldoc.perl.org/perlebcdic.html very closely.	[reply]
Re: Looping through a binary file by aitap (Curate) on Sep 08, 2013 at 19:03 UTC
`while (<filehandle>)` is implicitly `while (defined ($_ = readline filehandle))` which reads a very long line. You probably wanted to check whether there is still data to read, which is done by the eof function. Solution: use something like this: `while (!eof(perlIN))`.	[reply] [d/l] [select]