How to capture data length from the record ?

bh_perl has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How to capture data length from the record ? by gone2015 (Deacon) on Feb 13, 2009 at 10:29 UTC
I suggest making friends with pack and unpack -- there's the tutorial. A smattering of things which may not be obvious: the pack/unpack TEMPLATE string is evaluated at run-time, so it can be dynamic -- you can, for example, interpolate repeat counts into one, or build a string depending on what you are packing/unpacking. `C` is the way to handle bytes as individual integers (0..255) -- but watch out if you have 'utf8' ("wide character") strings to unpack. `a` is the way to handle sections of arbitrary byte values. in `unpack` you can use `x` to skip past bytes, and you can use `@` to step to particular positions in the input. the construct `C/a`, and all its cousins, is very useful where you have data prefixed by its length in bytes you can bracket stuff in pack/unpack templates, and apply repeat counts to bracketted sections (and nest bracketted sections). So: `'(C/(NnnN2)(a4)2)'` unpack as many items as it can ( `'(.....)*'`), where each item comprises: a byte count followed by that many `NnnN2` elements, followed by 2 elements each of 4 bytes.	[reply] [d/l] [select]
Re^2: How to capture data length from the record ? by bh_perl (Monk) on Feb 16, 2009 at 03:17 UTC
hi.. This is my program, but its did not keep process next record and its captured first record only. Might be some mistake on my looping. Could somebody help me ? `open (DATA, "$inputdir/$inputfile"); while (<DATA>) { my ($filehdr, $data) = m\|(.{40})(.*)\|; ($len = substr($data, $offset, 4)) =~ s/^0//g; foreach $val ($data) { $cdata = substr($val, $offset, $len); print "$cdata\n"; } } close(DATA);` [download]	[reply] [d/l]
Re^3: How to capture data length from the record ? by gone2015 (Deacon) on Feb 16, 2009 at 11:19 UTC
Well... I can offer a few observations on this code: `1: open (DATA, "$inputdir/$inputfile"); 2: while (<DATA>) { 3: my ($filehdr, $data) = m\|(.{40})(.)\|; 4: ($len = substr($data, $offset, 4)) =~ s/^0//g; 5: 6: foreach $val ($data) { 7: $cdata = substr($val, $offset, $len); 8: print "$cdata\n"; 9: } 10: } 11: close(DATA);` [download] first: no `use strict` or `use warnings`... so the fact that you never give `$offset` a value is not pointed out to you. two of your variables are declared "`my`", but not the others... which isn't encouraging, either. line 1: the `DATA` file handle has a particular use. There is no need to use it for your input file, and doing so makes things less clear than they might be. also line 1: adding at least an `or die "$!"` after the `open` is recommended -- there is no point in proceeding if the file open fails, and if it does, it will be helpful to know why it fails. line 2: `while (<DATA>)` -- what do you expect this to do ? What I expect it to do is to read the file, line by line -- where `$/` specifies the current line-ending. I see nothing in your description of the input file that suggests that the "records" are separated by `"\n"`. Indeed the description suggests that it is a continuous stream of hex characters (`[0-9A-F]`), in which case `<DATA>` will read the entire file in one gulp... (assuming `$/` has not been set to anything) In which case, why bother with the `while` ? line 3: this sets `$filehdr` to be the first 40 bytes of the current input "line", and `$data` to be the rest. I'd be tempted to do this with `substr`, being more direct and probably faster. But what you have will work. line 4: the `$offset` value is never set, and will be treated as zero. I assume that it is supposed to be the offset within `$data` of the next "record". If so, then the fact that it's not set to anything will be a big part of why your code only does something with the first "record". also line 4: it appears that the length is a decimal value. Since the rest appears to be hex, that bothers me... line 6: what do you expect `foreach $val ($data)` to do ? `foreach` works its way through a `LIST`* -- the list here is exactly one element long... so not much looping involved. line 7: again uses `$offset` which has no value. But do not despair... the solution is close. Consider: what `$offset` is doing; how you should update it after each "record"; and then how to recast your loop to work your way along the `$data` string. You could think about an initial value for `$offset`, which could eliminate the need to split the input into `$filehdr` and `$data`. As it happens, I would still use `unpack` for this, but `substr` will also get the job done.	[reply] [d/l] [select]
Re^4: How to capture data length from the record ? by bh_perl (Monk) on Jun 24, 2010 at 02:45 UTC
Re: How to capture data length from the record ? by Anonymous Monk on Feb 13, 2009 at 09:34 UTC
See many of your previous questions, many of the answers apply, you may have to change 280 to 4.	[reply]
Re^2: How to capture data length from the record ? by bh_perl (Monk) on Feb 15, 2009 at 00:47 UTC
hi.. Thank you very much for your apply. But this data is different because the length of data record is flexible and the length of data record can be know on first 4 bytes for each record. That why i have to read first 4 bytes every record to know the length record for the data. The next record will be continue after that. Right now, i have not any ideas how to read first 4 bytes for every record to get the record length. This is an ASCII file format. Thank you,	[reply]
Re: How to capture data length from the record ? by dHarry (Abbot) on Feb 13, 2009 at 09:38 UTC
Open the file for reading and use read FILEHANDLE,SCALAR,LENGTH,OFFSET. Read the first four bytes to determine the length of the record, then read the record. Are the four bytes part of the record? You can use the read function to skip bytes as well, i.e. the trailing 40 bytes. How do you know it's the trailer and not another record? And last but not least: what have you tried so far? Could you show some code?	[reply]
Re^2: How to capture data length from the record ? by bh_perl (Monk) on Feb 15, 2009 at 02:41 UTC
Where i can refer any example for read FILEHANDLE,SCALAR,LENGTH,OFFSET ?.	[reply]
Re: How to capture data length from the record ? by targetsmart (Curate) on Feb 13, 2009 at 09:38 UTC
Based on my understanding on your post. You have the algorithm on hand. Does this trailer also has header?, if so it is very easy. it fits into your existing algorithm. If it doesn't have header don't worry, after reading all the records based on the length of the records(first 4 bytes at the start of every record), only the trailer is going to be remaining take it off easily. if you worry that while reading the records if are about to mistakenly read the trailer, then you have a problem. for that you maintain a counter and store the size of the entire set of records(including trailer), and after reading every record just subtract the current length of the record in the existing counter(+4). if you have only 40 remaining on the counter; skip the remaining bytes. You missed to tell us that; where you are reading this record from, either from file/socket/pipe .. etc?. Vivek -- In accordance with the prarabdha of each, the One whose function it is to ordain makes each to act. What will not happen will never happen, whatever effort one may put forth. And what will happen will not fail to happen, however much one may seek to prevent it. This is certain. The part of wisdom therefore is to stay quiet.	[reply]


Your skill will accomplish what the force of many cannot
	PerlMonks