Category: | Text Processing |
Author/Contact Info | Curtis Autery |
Description: | This is a simple script to extract the EBCDIC text from an X9.37 formatted file. For those unfamiliar, this is a file format used in banking that has scanned check images in it, mixed with flatfile data describing the account numbers, dollar amounts, etc. The file format is obfuscated, but straightforward. You have a 4 byte record length field, then that many bytes of EBCDIC text, with one exception: the "52 Record". The first two bytes of data are the record number. Record 52 has 117 bytes of EBCDIC, and the remainder is binary TIFF data. This script has a flag that determines whether or not to ignore the binary TIFF data, or export it to files. |
#!/usr/bin/perl -w
use strict;
use Encode;
my $tiff_flag = 0;
my $count = 0;
open(FILE,'<',$ARGV[0]) or die 'Error opening input file';
binmode(FILE) or die 'Error setting binary mode on input file';
while (read (FILE,$_,4)) {
my $rec_len = unpack("N",$_);
die "Bad record length: $rec_len" unless ($rec_len > 0);
read (FILE,$_,$rec_len);
if (substr($_,0,2) eq "\xF5\xF2") {
if ($tiff_flag) {
$count++;
open (TIFF, '>', $ARGV[0] . '_img' . sprintf("%04d",$count) . '.
+tiff')
or die "Can't create image file";
binmode(TIFF) or die 'Error setting binary mode on image file';
print TIFF substr($_,117);
close TIFF;
}
$_ = substr($_,0,117);
}
print decode ('cp1047', $_) . "\n";
}
close FILE;
|
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Check21/X9.37 text extractor
by jwkrahn (Monsignor) on Jul 31, 2009 at 23:28 UTC | |
by ambrus (Abbot) on Aug 01, 2009 at 12:07 UTC | |
by Anonymous Monk on Aug 02, 2009 at 09:17 UTC | |
Re: Check21/X9.37 text extractor
by Anonymous Monk on Nov 19, 2009 at 21:57 UTC | |
Re: Check21/X9.37 text extractor
by Anonymous Monk on Jun 16, 2015 at 20:08 UTC |
Back to
Code Catacombs