Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

How to capture data length from the record ?

by bh_perl (Monk)
on Feb 13, 2009 at 09:05 UTC ( [id://743562]=perlquestion: print w/replies, xml ) Need Help??

bh_perl has asked for the wisdom of the Perl Monks concerning the following question:

Hi..

Could somebody help me how to read data based on data length record. The data length record will be on first 4 bytes of the record. As example:-

00160AA090261666367739610000000032783210910000000029856331301060010529 +12574680529125801910000012003 300000150000181000009900400000000000000000000012000000001101600160AA09 +02616659377397100000000327816300100000000296378437010600105291252180 0529125817810000010303300006000006001000198000400000000000000000000012 +000000001101600145AA090201666337739809000000007278110810000000026764 1798010600105291257339052912584082000001900330000107000112100003960040 +0000000000000000000010160AA09026166647773991000000003272387001000000 0632983193101060010529125843405291258470100630111095000000400000610000 +03300400000000000000000000012000000000001600145AA0902016664077400090 000000043708188100000000267954007010600105
Based on example, i have to read first 4 byte which is "0160" is the length record for record 1. After finish read record 1, then i have to read first 4 bytes again for the second length record and so on until its finish.

Otherwise, actually my data has a trailer and the trailer length is fixed to 40 byte. Do you have any ideas to skip 40 bytes at the end of record?

Replies are listed 'Best First'.
Re: How to capture data length from the record ?
by gone2015 (Deacon) on Feb 13, 2009 at 10:29 UTC

    I suggest making friends with pack and unpack -- there's the tutorial.

    A smattering of things which may not be obvious:

    • the pack/unpack TEMPLATE string is evaluated at run-time, so it can be dynamic -- you can, for example, interpolate repeat counts into one, or build a string depending on what you are packing/unpacking.

    • C is the way to handle bytes as individual integers (0..255) -- but watch out if you have 'utf8' ("wide character") strings to unpack.

    • a is the way to handle sections of arbitrary byte values.

    • in unpack you can use  x to skip past bytes, and you can use  @ to step to particular positions in the input.

    • the construct  C/a*, and all its cousins, is very useful where you have data prefixed by its length in bytes

    • you can bracket stuff in pack/unpack templates, and apply repeat counts to bracketted sections (and nest bracketted sections). So:  '(C/(NnnN2)(a4)2)*' unpack as many items as it can ( '(.....)*'), where each item comprises: a byte count followed by that many  NnnN2 elements, followed by 2 elements each of 4 bytes.

      hi..

      This is my program, but its did not keep process next record and its captured first record only. Might be some mistake on my looping. Could somebody help me ?

      open (DATA, "$inputdir/$inputfile"); while (<DATA>) { my ($filehdr, $data) = m|(.{40})(.*)|; ($len = substr($data, $offset, 4)) =~ s/^0//g; foreach $val ($data) { $cdata = substr($val, $offset, $len); print "$cdata\n"; } } close(DATA);

        Well... I can offer a few observations on this code:

        1: open (DATA, "$inputdir/$inputfile"); 2: while (<DATA>) { 3: my ($filehdr, $data) = m|(.{40})(.*)|; 4: ($len = substr($data, $offset, 4)) =~ s/^0//g; 5: 6: foreach $val ($data) { 7: $cdata = substr($val, $offset, $len); 8: print "$cdata\n"; 9: } 10: } 11: close(DATA);
        • first: no use strict or use warnings... so the fact that you never give $offset a value is not pointed out to you.

        • two of your variables are declared "my", but not the others... which isn't encouraging, either.

        • line 1: the DATA file handle has a particular use. There is no need to use it for your input file, and doing so makes things less clear than they might be.

        • also line 1: adding at least an or die "$!" after the open is recommended -- there is no point in proceeding if the file open fails, and if it does, it will be helpful to know why it fails.

        • line 2: while (<DATA>) -- what do you expect this to do ? What I expect it to do is to read the file, line by line -- where $/ specifies the current line-ending. I see nothing in your description of the input file that suggests that the "records" are separated by "\n". Indeed the description suggests that it is a continuous stream of hex characters ([0-9A-F]), in which case <DATA> will read the entire file in one gulp... (assuming $/ has not been set to anything) In which case, why bother with the while ?

        • line 3: this sets $filehdr to be the first 40 bytes of the current input "line", and $data to be the rest. I'd be tempted to do this with substr, being more direct and probably faster. But what you have will work.

        • line 4: the $offset value is never set, and will be treated as zero. I assume that it is supposed to be the offset within $data of the next "record". If so, then the fact that it's not set to anything will be a big part of why your code only does something with the first "record".

        • also line 4: it appears that the length is a decimal value. Since the rest appears to be hex, that bothers me...

        • line 6: what do you expect foreach $val ($data) to do ? foreach works its way through a LIST -- the list here is exactly one element long... so not much looping involved.

        • line 7: again uses $offset which has no value.

        But do not despair... the solution is close. Consider: what $offset is doing; how you should update it after each "record"; and then how to recast your loop to work your way along the $data string. You could think about an initial value for $offset, which could eliminate the need to split the input into $filehdr and $data.

        As it happens, I would still use unpack for this, but substr will also get the job done.

Re: How to capture data length from the record ?
by Anonymous Monk on Feb 13, 2009 at 09:34 UTC
      hi..

      Thank you very much for your apply. But this data is different because the length of data record is flexible and the length of data record can be know on first 4 bytes for each record. That why i have to read first 4 bytes every record to know the length record for the data. The next record will be continue after that.

      Right now, i have not any ideas how to read first 4 bytes for every record to get the record length.

      This is an ASCII file format.

      Thank you,

Re: How to capture data length from the record ?
by dHarry (Abbot) on Feb 13, 2009 at 09:38 UTC

    Open the file for reading and use read FILEHANDLE,SCALAR,LENGTH,OFFSET. Read the first four bytes to determine the length of the record, then read the record. Are the four bytes part of the record? You can use the read function to skip bytes as well, i.e. the trailing 40 bytes. How do you know it's the trailer and not another record? And last but not least: what have you tried so far? Could you show some code?

      Where i can refer any example for read FILEHANDLE,SCALAR,LENGTH,OFFSET ?.
Re: How to capture data length from the record ?
by targetsmart (Curate) on Feb 13, 2009 at 09:38 UTC
    Based on my understanding on your post.
    You have the algorithm on hand.
    Does this trailer also has header?, if so it is very easy. it fits into your existing algorithm.
    If it doesn't have header don't worry, after reading all the records based on the length of the records(first 4 bytes at the start of every record), only the trailer is going to be remaining take it off easily.
    if you worry that while reading the records if are about to mistakenly read the trailer, then you have a problem.
    for that you maintain a counter and store the size of the entire set of records(including trailer), and after reading every record just subtract the current length of the record in the existing counter(+4). if you have only 40 remaining on the counter; skip the remaining bytes.
    You missed to tell us that; where you are reading this record from, either from file/socket/pipe .. etc?.

    Vivek
    -- In accordance with the prarabdha of each, the One whose function it is to ordain makes each to act. What will not happen will never happen, whatever effort one may put forth. And what will happen will not fail to happen, however much one may seek to prevent it. This is certain. The part of wisdom therefore is to stay quiet.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://743562]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (8)
As of 2024-04-26 08:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found