I've put the start of a module together. It's probably got some bugs and some functionality is missing, but I thought I would go ahead and post and see if anyone is interested in going any further. Yes, it still needs some refactoring for performance enhancement and clarity. Perhaps some of you Perl experts would volunteer some suggestions.
To use this module, you just need a COBOL copylib member in a disk file. If you have a data file associated, you can use that, too.
Functions:
Here are a couple of things you can use this for. Say you have a data file and a copylib and need to find some records in the data file. The example will find the records you want, and output those records to a CSV-formatted file, with a header record.
USE COBOLIO;
$cobempl = COBOLIO->new("copylibs/PEOPLE.txt", "PEOPLE_FILE", 1);
open(INPUTFILE, "EMPL.PEOPLE.LIST.DISK");
open(OUTPUTFILE, ">BadEmployees.txt");
printf OUTPUTFILE, $cobio->GetCSVHeader("PEOPLE_FILE");
while(<INPUTFILE>) {
$cobempl->ReadRecInto($_, "PEOPLE_FILE");
if($cobempl->GetVal("SICK_DAYS_USED") > 5) {
print "employee ".$cobempl->GetVal("EMPLOYEE_NUMBER")." has us
+ed up all sick leave\n";
printf OUTPUTFILE, $cobio->GetCSVRecord("PEOPLE_FILE");
}
}
..
..
The module will deal with signed data (but not packed decimal, yet), add decimals to numbers, etc. You can display the layout of the data with position numbers, and you can parse any data you read into CSV format to import to a database or spreadsheet.
- new This creates a new COBOL record format. You must supply:
copyLibFileName - file name of the copy lib member.
fileDescriptorName - You must supply a unique file descriptor name, even if the item is not used within a file descriptor. You can have several file descriptors for a COBOL copylib member, like you would in a COBOL program to do INPUT -> Process -> OUTPUT.
fileDescriptorFlag - This is a 1 if the item is used within a file descriptor, 0 if not. This is necessary so the module knows whether to use a "implicit" REDEFINES on multiple "01" record level names, like COBOL does.
first01RecordName - If the copylib file does not contain an "01" record level name on the first line, you can supply one to use here.
- PrintLayouts Prints out the names, levels, start positions, lengths, etc. of the data members to STDOUT. Requires the fileDescriptorName as a parameter.
- ReadRecInto Accepts a single line of data and parses it into the fileDescriptor record that you specify.
- GetVal Returns the value of a defined data member or record, numerically formatted (if numeric). Requires the name of the item, the fileDescriptorName, and optionally a record name as a qualifyer.
- SetVal Assigns a data value to a defined data member or record. Requires the name of the item, the new value, the fileDescriptorName, and optionally a record name as a qualifyer.
- GetCSVHeader Returns the data names of the data items in CSV format. Requires the fileDescriptorName, and optionally a record name to start with.
- GetCSVRecord Returns the current values of a data record in CSV format. Requires the fileDescriptorName, and optionally a record name to start with.
Any comments or suggestions appreciated.
The module: COBOLIO.pm
# COBOLIO.pm
#
# Copyright (c) 2002 Harry Holt <hholt@comcast.net>. All rights reser
+ved.
# This program is free software; you can redistribute it and/or
# modify it under the same terms as Perl itself.
#
# Reading and interpretation of COBOL copylibs into perl data structur
+es
#
# All variable names are the same as the cobol names, but with the '-'
# changed to a '_' to avoid operator/keyword issues.
#
# SIGNS: Signs may be added to a COBOL pic clause and require special
# processing (all OVER the place!!), so each name is given a SIGN a
+ttribute.
# The possible values are:
# R - No PIC clause for this var (a record-level variable)
# X - No SIGN specified, non-numeric
# 9 - No SIGN specified, but variable is numeric
# + - SIGN is specified.
# When the sign is specified, reading the data will require bit-shi
+fting.
# The LAST digit of the number is the sign and the last number. If
+ we
# were still working in EBCDIC, would could do a straight bit-shift
+ and
# get the sign and the number, but because of the ASCII translation
+, things
# don't work out like they should:
# A-I = + 1-9
# { = + 0
# J-R = - 1-9
# I'm not sure what to look for if the value was a 0 with a negativ
+e sign.
# It SHOULD be Hex D0, but since this has no representation in EBCD
+IC, I
# don't know how it's represented. +0 should be Hex C0, but it is
# represented as { anyway.
# To make things simple for the Perl programmer, we will handle all
+ the translations,
# and often add an extra byte to the numeric variable to allow for
+the "-" sign.
# Then we strip in off to update the variable value (THIS NEEDS REF
+ACTORING).
#
package COBOLIO;
#
#
#
require Exporter;
require DynaLoader;
#
$COBOLIOPackage = "COBOLIO";
#
@ISA = qw( Exporter DynaLoader );
#
#
#
sub new {
($class, $copyLibName, $fdRec, my $isFD, $rec01) = @_;
$recList = {};
$FD{$fdRec} = $recList;
CreateCobolRec( $copyLibName, $fdRec, $isFD, $rec01);
my $self = { $class, $FD };
bless $self;
return $self;
}
sub PrintLayouts {
($self, my $FDName) = @_;
# Show the layouts of the existing copylib members
for $filedesc (keys %FD) {
print "********************** $filedesc ********************
+**\n";
printf "%-32s %7s %7s %7s %6s %7s \n", "Name", "Level", "
+Start", "Len", "Sign", "Parts";
printf "%-32s %7s %7s %7s %6s %7s \n", "-" x 32, "-----",
+ "-----", "---", "----", "-----";
for $reclist (sort BySRT keys %{ $FD{$filedesc} }) {
if($reclist eq "VAL") {
print "VALUE: \n$FD{$filedesc}->{VAL}\n";
} else {
printf "%-32s %7s %7s %7s %6s %7s \n",
$reclist,
$FD{$filedesc}{$reclist}->{LEVEL},
$FD{$filedesc}{$reclist}->{STARTPOS},
$FD{$filedesc}{$reclist}->{LEN},
$FD{$filedesc}{$reclist}->{SIGN},
$FD{$filedesc}{$reclist}->{PARTS} ;
}
}
print "\n\n";
}
}
sub GetRec {
my $self = shift;
my $fdRec = shift;
$recList = $FD{$fdRec};
return $recList;
}
#sub DESTROY {
# my ($self) = shift;
# if(!undef($self->{cbNames})) {
# undef $self->{cbNames};
# }
# return 1;
#}
sub ReadRecInto {
# Start by setting up the record structure
($self, $inputLine, $fdRec) = @_;
$FD{$fdRec}->{VAL} = $inputLine;
return 1;
} #### End sub ReadRecInto
sub GetVal {
($self, my $dataItemName, my $fd, my $dataRecName) = @_;
# If the item has an "Occurs" clause, we may need a subscript
# of the value, so we need to parse the name to get the name and s
+ubscript
($dataItemName, $subScr, $post) = split /[(,)]/, $dataItemName;
if($subScr < 1) { $subScr = 1; }
# If the FD is not passed, we will look it up
if(defined($fd)) {
$fdRec = $fd;
} else {
$fdRec = FindFDForRecord($dataItemName);
}
if(length($dataRecName) < 1) {
$dataRecName = $FD{$fdRec}{$dataItemName}->{REC01};
}
# Get the part of the major record that contains the record being
+asked for.
if($FD{$fdRec}{$dataItemName}->{PARTS} > 0) {
$actualLen = $FD{$fdRec}{$dataItemName}->{LEN} / $FD{$fdRec}{$
+dataItemName}->{PARTS};
} else {
$actualLen = $FD{$fdRec}{$dataItemName}->{LEN}
}
$actualPos = $FD{$fdRec}{$dataItemName}->{STARTPOS} + ($actualLen
+* ($subScr - 1));
$retVal = substr( $FD{$fdRec}->{VAL}, $actualPos, $actualLen);
$retVal =~ s/\x00/0/g;
# Deal with an signed numeric data
if($FD{$fdRec}{$dataItemName}->{SIGN} eq "+") {
if(substr($retVal,$FD{$fdRec}{$dataItemName}->{LEN} - 1,1) eq
+"{") {
$retVal = substr($retVal,0,$FD{$fdRec}{$dataItemName}->{LE
+N} -1)."0";
} else {
if($retVal =~ m/[A-I]/) {
$retVal =~ tr/[ABCDEFGHI]/[123456789]/;
}
if($retVal =~ m/[J-R]/) {
$retVal =~ tr/[JKLMNOPQR]/[123456789]/;
$retVal = '-'.$retVal;
}
}
} # End of SIGN logic
# If decimals are specified, we will need to add a "." in the righ
+t spot
if($FD{$fdRec}{$dataItemName}->{DECIMALS} > 0) {
$retVal = substr($retVal,0,length($retVal) - $FD{$fdRec}{$data
+ItemName}->{DECIMALS}).
".".substr($retVal,length($retVal) - $FD{$fdRec}{$data
+ItemName}->{DECIMALS});
} # End of DECIMAL logic
return $retVal;
}
sub GetCSVRecord {
($self, my $fd, my $name01) = @_;
$fdRec = $fd;
$outRec = "";
$flNeedSep = 0;
$in01 = 0;
if(undef($name01)) { $in01 = 1; }
for $filedesc (keys %FD) {
if($filedesc eq $fdRec) {
for $reclist (sort BySRT keys %{ $FD{$filedesc} }) {
if($reclist eq $name01) { $in01 = 1; }
if($in01 == 1) {
if($flNeedSep == 1) {
$outRec .= ",";
$flNeedSep = 0;
}
if($FD{$filedesc}{$reclist}->{SIGN} ne "R") {
$outRec .= "\"".GetVal("", $reclist)."\"";
$flNeedSep = 1;
}
}
}
}
}
return $outRec;
}
sub GetCSVHeader {
($self, my $fd, my $name01) = @_;
$fdRec = $fd;
$outRec = "";
$flNeedSep = 0;
$in01 = 0;
if(undef($name01)) { $in01 = 1; }
for $filedesc (keys %FD) {
if($filedesc eq $fdRec) {
for $reclist (sort BySRT keys %{ $FD{$filedesc} }) {
if($reclist eq $name01) { $in01 = 1; }
if($in01 == 1) {
if($flNeedSep == 1) {
$outRec .= ", ";
$flNeedSep = 0;
}
if($FD{$filedesc}{$reclist}->{SIGN} ne "R") {
$outRec .= $reclist;
$flNeedSep = 1;
}
}
}
}
}
return $outRec;
}
sub SetVal {
($self, my $dataItemName, my $newValue, my $fd, my $dataRecName
+) = @_;
if(defined($fd)) {
$fdRec = $fd;
} else {
$fdRec = FindFDForRecord($dataItemName);
}
if(length($dataRecName) < 1) {
$dataRecName = $FD{$fdRec}{$dataItemName}->{REC01};
}
if((substr($newValue,0,5)) eq "SPACE") {
$newValue = " " x $FD{$fdRec}{$dataItemName}->{LEN};
}
# Fix any numerics. Allow an extra space in case of a sign
if($FD{$fdRec}{$dataItemName}->{SIGN} =~ m/\+|9/) {
$dataLen = $FD{$fdRec}{$dataItemName}->{LEN} + 1;
$strMask = 'sprintf("%0'.$dataLen.'d", $newValue);';
$newValue = eval($strMask);
}
# We need to deal with Signed numerics somehow. This logic seems
+easily breakable, though
if($FD{$fdRec}{$dataItemName}->{SIGN} eq "+") {
if(substr($newValue,0,1) eq "+") { $newValue = substr($newValu
+e,1); }
if(substr($newValue,0,1) eq "-") {
for($newValue) {
s/0\z/\{/g;
s/1\z/J/g;
s/2\z/K/g;
s/3\z/L/g;
s/4\z/M/g;
s/5\z/N/g;
s/6\z/O/g;
s/7\z/P/g;
s/8\z/Q/g;
s/9\z/R/g;
}
} else {
for($newValue) {
s/0\z/\{/g;
s/1\z/A/g;
s/2\z/B/g;
s/3\z/C/g;
s/4\z/D/g;
s/5\z/E/g;
s/6\z/F/g;
s/7\z/G/g;
s/8\z/H/g;
s/9\z/I/g;
}
}
} # Finished dealing with signed numerics #####################
+##############
# Get rid of extra place for sign
if(substr($newValue,0,1) eq "-") { $newValue = substr($newValue,1)
+; }
$newValue =~ s/'.'//g;
$packTempl = 'A'.$FD{$fdRec}{$dataItemName}->{LEN};
$newValue = pack($packTempl,$newValue);
# In case the VAL is not large enough for the data item, add enoug
+h spaces
if(length($FD{$fdRec}->{VAL}) < $FD{$fdRec}{$dataItemName}->{START
+POS}) {
$FD{$fdRec}->{VAL} .= " " x $FD{$fdRec}{$dataItemName}->{START
+POS};
}
# Set the actual value within the larger record
$FD{$fdRec}->{VAL} =
substr($FD{$fdRec}->{VAL},0,$FD{$fdRec}{$dataItemName}->{START
+POS}).
$newValue.
substr($FD{$fdRec}->{VAL},$FD{$fdRec}{$dataItemName}->{STARTPO
+S} +
$FD{$fdRec}{$dataItemName}->{LEN});
return 1;
}
sub CreateCobolRec {
(my $copyLibName, my $fdRec, my $isFD, my $rec01) = @_;
##################################################################
+###############
# Here we create the "Data Division" interpretations for the
# perl variables. Each name found in the COBOL code becomes a
# data element name with the following attributes:
#
# LEVEL - The COBOL "record level", 01, 03, 05, 88, etc.
# STARTPOS - The starting position of the data element within t
+he overall record
# LEN - The character length of the data element
# SIGN - Whether a +/- sign is used for a numeric value
# REC01 - The top-level (01) data element name that this elemen
+t is in.
#
# The File Descriptor (FD) of the COBOL source also contains a VAL
+ attribute,
# which holds the value of a single record when it is read in. To
+
# find or set the value of any other record or data element, the
# appropriate section of the complete record is used.
#
# To accomodate different styles of using COBOL "copy" statements,
+ the initial
# 01 level record can either be included in the copy member file,
+or it can
# be supplied on the command line.
# NOTE: COBOL "01" record-level intries subordinate to an "FD" cla
+use are
# implicit "REDEFINES".
##################################################################
+################
### Initializations
my $periodAt;
my $nextPos = 0;
my $currLevelLen = 0;
my $currentLevel = 01;
my $fillerCount = 0;
my $recStarted = 0;
my $charCount = 0;
my $currLevel = 01;
my $cnt = 0;
my $name;
my $picChars = "";
my $rec01Name;
my $recName;
my $level;
my @recNames;
my @recLevel;
my @recLen;
my @cplLines = ();
my @cpl = ();
my @vals = ();
my $initVal = "";
my $cplLine = "";
my $rl;
my $occurring = 0;
my @occurringLevel;
if(defined($rec01)) {
$rec01Name = $rec01;
push @cplLines, " 01 ".$rec01Name.'.';
}
#### Read the COPYLIB file
die "I can't open the file $copyLibName because $!" unless
open(COPYLIB, $copyLibName);
while(<COPYLIB>) {
push @cplLines, $_;
}
close(COPYLIB);
#### Concatenate all COPYLIB lines into 1 line-per-sentence
for($cnt = 0; $cnt <= $#cplLines; $cnt++) {
while(substr($cplLines[$cnt],6,1) eq "*") { $cnt++; } # throw
+out comments
last if($cnt > $#cplLines); # exit for
+if end of file
$cplLine = substr($cplLines[$cnt],6,66); # get rid
+of line #'s & comments
while($cplLine !~ /\./) { # concatenation b
+ased on the period
if($cplLine =~ "\"") {
$cplLines[$cnt + 1] =~ s/\"//; # remove redunda
+nt quotes on next line
}
if(substr($cplLines[$cnt + 1],6,1) ne "*") {
$cplLine .= substr($cplLines[$cnt + 1],11,61);
}
$cnt += 1;
}
if($cplLine =~ m/\./g) { # chop line after the peri
+od
$periodAt = pos($cplLine);
$cplLine = substr($cplLine,0,$periodAt);
}
push @cpl, $cplLine;
}
##################################################################
+########
# Next starts the loop to interpret the COBOL data members and cre
+ate the
# hash of hashes to store the attributes of the data items.
#
##################################################################
+########
$cplSub = 0;
for (@cpl) {
@stuff = ();
s/-/_/g; # Change all "-" (dashes) to "_" (underscores)
s/\.//g; # Eliminate all periods
if($_ =~ "PIC") { # Get the position, length of defined dat
+a member
@stuff = split;
$level = $stuff[0];
$name = $stuff[1];
if($name eq "FILLER") { # Make "FILLER" fields have un
+ique names
$name = "FILLER".$fillerCount;
$fillerCount++;
}
if($stuff[2] =~ "OCCURS") { # Check for "OCCURS" claus
+e
$parts = $stuff[3]; # PARTS = 1 unless the
+"OCCURS"
$picClause = $stuff[6]; # clause defines multip
+les.
} else {
$parts = 1;
$picClause = $stuff[3];
}
if($stuff[2] =~ "REDEFINES") {
$nextPos = $recList->{$stuff[3]}->{STARTPOS};
$picClause = $stuff[5];
}
if(substr($picClause,0,1) eq "S") { # Look for a signe
+d numeric
$sign = "+";
} else {
if(substr($picClause,0,1) eq "9") {
$sign = "9";
} else {
$sign = "X";
}
}
###########
# Next, the PICTURE clause is parsed to determine the data
+ type
# and the size of the field.
($picChars, $charCount, $other) = split /[(,)]/ ,$picClaus
+e;
if($charCount == "") {
$picChars =~ s/V//g;
$picChars =~ s/S//g;
$charCount = length($picChars);
}
$other =~ s/V//g;
$charCount = $charCount + length($other);
$charCount = $charCount * $parts;
$decimals = length($other);
$recList->{ $name } = {
LEVEL => $level,
STARTPOS => $nextPos,
LEN => $charCount,
SIGN => $sign,
REC01 => $rec01Name,
PARTS => $parts,
DECIMALS => $decimals,
SRT => $cplSub,
};
if($stuff[4] =~ "VALUE") {
@vals = split/VALUE/;
$initVal = $vals[1];
if($initVal =~ "SPACE") {
$initVal = " " x $charCount;
}
if($initVal =~ "ZERO") {
$initVal = "0" x $charCount;
}
$initVal =~ s/\s+//x;
$initVal =~ s/\"//g;
SetVal("", $name, $initVal, $fdRec);
}
$nextPos += $charCount;
while($currLevel > $level) {
$recName = pop @recNames;
# $recList->{$recName}->{LEN} += (pop @recLen) * (pop @
+recParts);
$currLevel = pop @recLevel;
pop @recLen;
pop @recParts;
}
# for($rl=0;$rl <= $#recNames;$rl++) {
# if($recLevel[$rl] < $level) { $recLen[$rl] += ($charC
+ount * $recParts[$rl -1]); }
# }
} else { # Deal with record-level data variables
@stuff = split;
$level = $stuff[0];
if($level == 01) {
$rec01Name = $stuff[1];
if($isFD == 1) { $nextPos = 0; }
}
while($currLevel > $level) {
$recName = pop @recNames;
$recList->{$recName}->{LEN} += pop @recLen;
$currLevel = pop @recLevel;
pop @recParts;
}
if($stuff[2] =~ "REDEFINES") {
$nextPos = $recList->{$stuff[3]}->{STARTPOS};
}
if($stuff[2] =~ "OCCURS") {
$parts = $stuff[3];
} else {
$parts = 1;
}
$recList->{$stuff[1]} = {
LEVEL => $level,
STARTPOS => $nextPos,
LEN => 0,
SIGN => "R",
REC01 => $rec01Name,
PARTS => $parts,
SRT => $cplSub,
};
push @recNames, $stuff[1];
push @recLevel, $level;
push @recLen, 0;
push @recParts, $parts;
}
$currLevel = $level;
NEXTREC:
$cplSub++;
#print "FOR $name:\t";
#print "level = $recList->{$name}->{LEVEL}\t";
#print "startpos = $recList->{$name}->{STARTPOS}\t";
#print "fdRec = $fdRec\n";
}
while($recName = pop @recNames) { # Get lengths for remaining le
+vels.
# $recList->{$recName}->{LEN} += pop @recLen;
pop @recLevel;
pop @recParts;
}
@rns = ();
for $recName (sort BySRT keys %$recList) {
if($recList->{$recName}->{SIGN} eq "R") {
push @rns, $recName;
}
}
for $recName (@rns) {
$filePos = $recList->{$recName}->{STARTPOS};
$rLevel = $recList->{$recName}->{LEVEL};
for $dataName (sort BySRT keys %$recList) {
if($recList->{$dataName}->{SRT} > $recList->{$recName}->{S
+RT}) {
if($rLevel < $recList->{$dataName}->{LEVEL}) {
if($recList->{$dataName}->{SIGN} ne "R") {
if($recList->{$dataName}->{STARTPOS} >= $fileP
+os) {
$recList->{$recName}->{LEN} += $recList->{
+$dataName}->{LEN};
$filePos = $recList->{$dataName}->{STARTPO
+S} + $recList->{$dataName}->{LEN};
}
}
} else { # Exit loop if new record at the same lev
+el
last;
}
}
}
}
} # sub CreateCobolRec()
sub ByLevel {
$FD{$fdRec}{$a}->{LEVEL} <=> $FD{$fdRec}{$b}->{LEVEL};
}
sub BySRT {
$FD{$fdRec}{$a}->{SRT} <=> $FD{$fdRec}{$b}->{SRT};
}
sub ByPosition {
$FD{$fdRec}{$a}->{STARTPOS} <=> $FD{$fdRec}{$b}->{STARTPOS}
||
$FD{$fdRec}{$a}->{LEVEL} <=> $FD{$fdRec}{$b}->{LEVEL};
}
sub FindFDForRecord {
my $recordName = shift;
my $returnValue = "";
my $foundCount = 0;
my $dataName;
for $filedesc (keys %FD) {
for $reclist (keys %{ $FD{$filedesc} }) {
if($reclist eq $recordName) {
$returnValue = $filedesc;
$foundCount += 1;
## print "Found $recordName in $filedesc\n";
## return $returnValue;
}
}
}
if($foundCount > 1) {
die "Ambiguous record name specified $recordName\n";
}
return $returnValue;
}
1;
| [reply] [d/l] [select] |