in reply to Re: Parsing and \G and /g and Stupid User Tricks
in thread Parsing and \G and /g and Stupid User Tricks

Ideally, the length field tells the parsing program how long the record is. The file is usually broken into its seperate records and put into MQ/Series to be sent to the database.

There are a few things that will cause the file to be rejected. One is, it reads a record (or what it thinks is a record) and then tries to read the next three characters, expecting the next record ID. If they are there, the file is put aside for us to fix.

Now... given what you are saying... will this work like I expect it to?

$RID = "MEH|MED|MMD|MMS|CR1|FR1"; while(<$dpf>) { while(m/($RID)(....)(.*)($RID)/g) { $RecordID = $1; $RecordLen = $2; $RecordData = $3; if (len($RecordData) != $RecordLen) { #ERROR: Record is wrong length } } }

What I am expecting is: it will step through the file (an example given below) over and over again, pulling each record into $RecordData. Do I need to use \G to get to continue where it left off? Do I need to do anything special for the last record in the file? Why did you use [^$TLI]+?

-Travis
PS: Example file:
MEH0016BUNCHODATA123456MED0019BUNCHMOREDATA456789MED0018MOREDATAAGAIN4 +4568


v0.2: changed while($dpf) to while(<$dpf>). Which brings us back to reading the file into $_. Thank, Tye.

Replies are listed 'Best First'.
RE: Length Field
by tye (Sage) on Aug 08, 2000 at 20:49 UTC

    while($dpf) doesn't really do anything. Perhaps you meant while(<$dpf>), but that will only work if you have newlines at appropriate places in the file (which doesn't sound like it is the case) or if you have set $/ to your record separator (but you don't have a record separator, do you?).

            - tye (but my friends call me "Tye")
RE: Length Field
by jlistf (Monk) on Aug 08, 2000 at 20:54 UTC
    that last parentheses, ($RID) should be an assertion, (?=$RID). if you don't do this, you will read in that ID and skip it the next time you read something. you don't need the \G unless there is pieces in the data that aren't going to match (it seems like everything in the data is a valid piece of data).

    perhaps you want to use split instead of a regex? if every part of your data is getting tested, you probably don't need, or want, a regular expression. split the data on your $rid codes, and test the rest of the data. or since you're reading from a file... read till you hit a $rid code. test what you've read. continue till EOF.
    s///
    by THuG (Beadle) on Aug 08, 2000 at 21:04 UTC
      Yeah,
      If I'm having to read in the entire file anyway, then I might as well break it into seperate lines. I imagine s/($RID)/\n$1/g would do the trick (is that valid?). Then I don't have to hunt for the next record if the current one is FUBAR.

      -Travis
        you can do this without reading in the entire file all at once. start at a $RID code, read until you hit the next $RID code. what you just read will be one full record from the file. test it, etc. then continue. the only problem will be figuring out how to stop reading once you hit a rid code. you could do some combination of seek and read to read in some data, find a $RID code and seek backwards through the file to the beginning of the code. something like:
        $currpos = 0; while ( read( $dpf, $input, 80, $currpos ) ) { # get 80 characters from $currpos till EOF $input =~ m/($RID)(.*?)(?:$RID)/g; #grab a code, data and set pos # test $1 and $2 for errors $currpos = pos( $input ) # set pos in file to beginning of next id }
        i think that'll do it... i might be missing something though.

        jeff