Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I read the man pages for pack and unpack but must admit that I am a bit lost. Here is what I have:
$data = '00010003206162'; $kategory = hex(substr($data,0,4)); $length = hex(substr($data,4,4)); $contents = pack('H'.$length*2, substr($data, 8, length($data)));
So the structure is:
- 4 characters with the kategory (hex number, here: 1),
- 4 characters with the length of the contents (hex number, here: 3),
- the contents as hex data (here: '206162' = ' ab')
- more data in the same or similar format

Now this code has to be really fast and I am sure it can be optimized. Since pack/unpack are very fast I hope that it is possible to do it along these lines (not working):

my ($kat, $contents) = pack("h4 h4/H*", $data);
It should somehow go with help of '/' I found in the pack manpage but must admit that I didn't understand it and have no idea how the given examples work.

As an additional complication sometimes the size isn't given directly but has to be subtracted from 65536, e.g. a value of 'fff5' means 65536 - 65525 = 11 as the size. But even if this esoteric counting can not be directly done it would help a lot if I could improve at least the more simple case above.

Any idea how this is best done?

Thanks
Michael

Replies are listed 'Best First'.
Re: pack with count in the data
by ikegami (Patriarch) on Sep 29, 2008 at 13:27 UTC
    It's really weird that you are using pack to work on hex digits instead of packed data.
    my $data = pack( 'H*', '00010003206162' ); my ( $kategory, $contents ) = unpack( 'n n/A', $data ); printf( "%04X\n", $kategory ); printf( "%s\n", $contents );
Re: pack with count in the data
by Anonymous Monk on Sep 29, 2008 at 13:13 UTC
Re: pack with count in the data
by AnomalousMonk (Archbishop) on Sep 29, 2008 at 18:26 UTC
    If not for the requirement that the length is sometimes (the OP implies randomly) given as the negative of the true length, this would be simple and probably fairly quick. As ikegami wrote, the trick is to first convert the hex input string to raw binary.

    This code works and will probably be somewhat faster than the code in the OP, but will certainly be slower than ikegami's suggestion, could it be used.

    C:\@Work\Perl\monks\714341>perl -wMstrict -le "print qq(\noutput:); for (@ARGV) { my $data = pack( 'H*', $_); my $offset = 0; while ($offset < length $data) { my ($kategory, $len) = unpack(qq{x$offset n n}, $data); $len = 65536 - $len if $len > 32767; my $contents = unpack(qq(x$offset x4 A$len), $data); $offset += 4 + $len; printf(qq(%04X, `%s' \n), $kategory, $contents); } print '------'; } " 0001fffd206162 0002000441424344 0001fffd20616200020004414243440005fffc454443 output: 0001, ` ab' ------ 0002, `ABCD' ------ 0001, ` ab' 0002, `ABCD' 0005, `EDC' ------
    You might try "\@$offset" in place of the "x$offset"; it might give you a slight speed improvement, but I doubt it. (The backslash escape before the '@' is needed to prevent interpolation as an array in the double-quoted string of the unpack template.)
Re: pack with count in the data
by Anonymous Monk on Sep 30, 2008 at 05:33 UTC
    Many thanks for all the answers this is exactly what I was looking for! ...but be patient with me, I would like to understand what makes it work. These are the main points I don't fully get:

    - how does the 'n n/A' know how many characters to take for the kategory and the counter. It would be easier to understand if it was something like 'n4 n4/A'.
    - In the manpage I found that n is in "big-endian order". Do I have to change the code to use 'v' on a Sparc (Sun) maschine? Or is all that matters the data and not the maschine it is running on?
    - I am not sure if I understand the line 'unpack(qq(x$offset x4 A$len), $data);' Is it correct that the two 'x' parameters are just there to consume some characters and only 'A$len' is really used?

    Many, many thanks!
    -Michael
      Sorry to be so long responding to your question; I haven't checked this thread for a while. I hope you will eventually see this.

      [H]ow does the 'n n/A' know how many characters to take for the kategory and the counter. It would be easier to understand if it was something like 'n4 n4/A'.
      The number of bytes (not characters: the number of bytes in a character varies according to the character set - ASCII, Unicode, etc. - perhaps locale, etc.) is implicit in the template specification: for 'n' it is two bytes, or, per the documentation, '_exactly_ 16 bits' in '"network" (big-endian) order'. So 'n4' is "four unsigned shorts, each of which is exactly 16 bits and with the bytes of each in big-endian order". What I infer from the last example in your question is something like a string, and I would tend to use 'a', 'b', or 'B', although to control the endianity of the byte sequence within each packed or unpacked string, you might have to use reverse before packing or after unpacking.
      In the manpage I found that n is in "big-endian order". Do I have to change the code to use 'v' on a Sparc (Sun) maschine? Or is all that matters the data and not the maschine it is running on?
      All that matters is the endianity of the data you are processing. The template specifiers 'n' and friends give you an 'absolute' endianity, independent of any platform, unlike 'i' and its ilk.
      I am not sure if I understand the line 'unpack(qq(x$offset x4 A$len), $data);' Is it correct that the two 'x' parameters are just there to consume some characters and only 'A$len' is really used?
      The 'A$len' specifier is the only one that extracts anything, but of course the two 'x' specifiers are necessary to do the extraction from the right place!