Okay...
I have a script that is parsing through single line files we call DPFs (Data Parsing Failure). It is looking to see why the file failed. In each of these files is a series of records, of different lengths. Each record starts with a three letter identifier. The next four characters give the length of the record.
Some of these DPFs occur because a record gets truncated, or goes beyond its length. This means the next record identifier is not where it should be. I am currently hunting for this by reading the next three characters, checking to see if it is a valid ID, and backing up two spaces (seek $dpf, -2, 1;) if it is not. I continue this until I find the next record.
I want to use m// to find the next record. I originally didn't use m//g because I miss read the doco on pos(). I thought it returned the last match, as though it found every occurance and returned the last one. Merlyn set me straight, it returns the position of the match from the last (most recent) m//g.
Well, I start using m//g and now pos() keeps giving me the first match in the file. I don't follow why it doesn't start looking from where the file pointer currently is (from the last read $dpf), but I figure I can get around this. I keep doing m//g in a do {} until pos() >= $lastknownpos. And then try to match one more time.
This ofcourse doesn't work. I try to use m/\G/g, but just don't understand how to use that.
Here is what I have... (more or less, the script is very long)
while ($currpos < $filelen){
my ($readLen) = read $dpf, my ($recordID), 3;
if (isValidID($recordID)) {
#check the record length
#if it is good, then parse the record for errors there
#if the record is the wrong length, then skip parsing
#when we read the next 3 chars we will probably not read a val
+id ID
#and should begin hunting
}
elsif ($readLen == 3) {
do {
$dpf =~ m/\G.*MEH|MED|MMD|MMS|CR1|FR1/ig;
} until (pos() >= $currpos);
#this should put me at the record ID of the last
#record I tried to parse
#I will imagine there is a better way, particularly if
#I can start from where the file pointer is and not from t
+he beginning of the file
if ($dpf =~ m/\G.*MEH|MED|MMD|MMS|CR1|FR1/ig) { #matching one mor
+e time will (hopefully) match the next record ID
seek $dpf, pos(), 0;
} #if it doesn't match, then this truncated record is the last r
+ecord in the file (typical)
else {
seek $dpf, 0, 2;
}
}
else { #if I didn't read three characters, then I hit EOF, seek
+ there so $currpos will be updated and we will fall out of the while{
+}
seek $dpf, 0, 2;
}
$currpos = tell $dpf;
} #end while
Of note:
$dpf is a filehandle
open (my $dpf=\*FH, "file.dpf");
I am sure I can put MEH|MED|MMD|MMS|CR1|FR1 in a scalar so I don't have to copy it so often. Also makes future version easy to update when I add new record formats. But one thing at a time. (I would start playing with m//o if I put MEH|MED|MMD|MMS|CR1|FR1 in a scalar, and I'm liable to break something)
So, my problem is... after I find the record ID of a bad record (too long or too short) how do I easily find the next record ID (if one exsists)?
-Travis