Category: Text Processing
Author/Contact Info Curtis Autery
Description: This is a simple script to extract the EBCDIC text from an X9.37 formatted file. For those unfamiliar, this is a file format used in banking that has scanned check images in it, mixed with flatfile data describing the account numbers, dollar amounts, etc. The file format is obfuscated, but straightforward. You have a 4 byte record length field, then that many bytes of EBCDIC text, with one exception: the "52 Record". The first two bytes of data are the record number. Record 52 has 117 bytes of EBCDIC, and the remainder is binary TIFF data. This script has a flag that determines whether or not to ignore the binary TIFF data, or export it to files.
#!/usr/bin/perl -w
use strict;
use Encode;

my $tiff_flag = 0;
my $count = 0;

open(FILE,'<',$ARGV[0]) or die 'Error opening input file';
binmode(FILE) or die 'Error setting binary mode on input file';

while (read (FILE,$_,4)) {
  my $rec_len = unpack("N",$_);
  die "Bad record length: $rec_len" unless ($rec_len > 0);
  read (FILE,$_,$rec_len);
  if (substr($_,0,2) eq "\xF5\xF2") {
    if ($tiff_flag) {
      open (TIFF, '>', $ARGV[0] . '_img' . sprintf("%04d",$count) . '.
  or die "Can't create image file";
      binmode(TIFF) or die 'Error setting binary mode on image file';
      print TIFF substr($_,117);
      close TIFF;
    $_ = substr($_,0,117);
  print decode ('cp1047', $_) . "\n";
close FILE;