in reply to Parsing and \G and /g and Stupid User Tricks

> Some of these DPFs occur because a record gets truncated, or goes beyond its length.

So, basically the length field is worthless, but always there? Why not something like this then:

$TLI = "MEH|MED|MMD|MMS|CR1|FR1"; while(m/($TLI)(....)([^$TLI]+)/g) { $error=$1; $length=$2; $text=$3; ## Do what you need with them here. +.. }

Replies are listed 'Best First'.
Parsing and \G and /g and Stupid User Tricks: Length Field: post v0.2
by THuG (Beadle) on Aug 08, 2000 at 20:40 UTC
    Ideally, the length field tells the parsing program how long the record is. The file is usually broken into its seperate records and put into MQ/Series to be sent to the database.

    There are a few things that will cause the file to be rejected. One is, it reads a record (or what it thinks is a record) and then tries to read the next three characters, expecting the next record ID. If they are there, the file is put aside for us to fix.

    Now... given what you are saying... will this work like I expect it to?

    $RID = "MEH|MED|MMD|MMS|CR1|FR1"; while(<$dpf>) { while(m/($RID)(....)(.*)($RID)/g) { $RecordID = $1; $RecordLen = $2; $RecordData = $3; if (len($RecordData) != $RecordLen) { #ERROR: Record is wrong length } } }

    What I am expecting is: it will step through the file (an example given below) over and over again, pulling each record into $RecordData. Do I need to use \G to get to continue where it left off? Do I need to do anything special for the last record in the file? Why did you use [^$TLI]+?

    -Travis
    PS: Example file:
    MEH0016BUNCHODATA123456MED0019BUNCHMOREDATA456789MED0018MOREDATAAGAIN4 +4568


    v0.2: changed while($dpf) to while(<$dpf>). Which brings us back to reading the file into $_. Thank, Tye.

      while($dpf) doesn't really do anything. Perhaps you meant while(<$dpf>), but that will only work if you have newlines at appropriate places in the file (which doesn't sound like it is the case) or if you have set $/ to your record separator (but you don't have a record separator, do you?).

              - tye (but my friends call me "Tye")
      that last parentheses, ($RID) should be an assertion, (?=$RID). if you don't do this, you will read in that ID and skip it the next time you read something. you don't need the \G unless there is pieces in the data that aren't going to match (it seems like everything in the data is a valid piece of data).

      perhaps you want to use split instead of a regex? if every part of your data is getting tested, you probably don't need, or want, a regular expression. split the data on your $rid codes, and test the rest of the data. or since you're reading from a file... read till you hit a $rid code. test what you've read. continue till EOF.
      s///
      by THuG (Beadle) on Aug 08, 2000 at 21:04 UTC
        Yeah,
        If I'm having to read in the entire file anyway, then I might as well break it into seperate lines. I imagine s/($RID)/\n$1/g would do the trick (is that valid?). Then I don't have to hunt for the next record if the current one is FUBAR.

        -Travis