Wiggins has asked for the wisdom of the Perl Monks concerning the following question:

I humbly submit this question, but these two hold great confusion for me. Especially when mixed with fileIO and implicit conversions.

Given, I have 'slurped' this (hex dumped) data into a scalar:

00000000 d0 cf 11 e0 a1 b1 1a e1 00 00 00 00 00 00 00 00 |......... +.......| 00000010 00 00 00 00 00 00 00 00 3e 00 03 00 fe ff 09 00 |........> +.......| 00000020 06 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |......... +.......| 00000030 38 00 00 00 00 00 00 00 00 10 00 00 3a 00 00 00 |8........ +...:...| 00000040 01 00 00 00 fe ff ff ff 00 00 00 00 37 00 00 00 |......... +...7...| 00000050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |......... +.......|
I wish to repeatedly extract 4 bytes, and interpret them as a signed (little-endian) integer; stopping when I find a value of -1.

I have attempted using a 'substr' in a loop indexed +4 for extraction; then packing the restult as an 'i'. But I don't seem to find an operation that results in 3759263696 for the first 4 bytes.

--UPDATE I got some results; not what I had expected, but not necessarily wrong, with:

@r = unpack "i30" , $doc; print join (",",@r); ---- -535703600,-518344287,0,0,0,0,196670,655358,6,0,0,1,56,0,4096,58,1,-2, +0,55,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
Upon reflection, this is probably correct, and just what I needed.

Replies are listed 'Best First'.
Re: To 'pack' or 'unpack' that is the question
by almut (Canon) on Nov 11, 2008 at 15:32 UTC

    Maybe your problem is just that a 32-bit signed integer cannot hold the value 3759263696 (too large).  In other words, if you use a signed int as in

    my $hex = "d0cf11e0a1b11ae1000000000000000001000000feffffff00000000370 +00000ffffffff"; my $bin = pack "H*", $hex; for my $val (unpack "l<*", $bin ) { # last if $val == -1; print "val=$val\n"; }

    you'd get

    val=-535703600 val=-518344287 val=0 val=0 val=1 val=-2 val=0 val=55 val=-1

    while, if you use an unsigned int (i.e. "L<*" in unpack), you'd get

    val=3759263696 val=3776623009 val=0 val=0 val=1 val=4294967294 val=0 val=55 val=4294967295

    but then you can no longer test for -1 as the stop condition...

    Update: added "<" modifier (Perl 5.10) to the unpack templates (=force little endian), so the sample code would also run correctly on big-endian machines (note that for "L<" you could also say "V", but there's no respective equivalent of (signed) "l<")

Re: To 'pack' or 'unpack' that is the question
by gone2015 (Deacon) on Nov 11, 2008 at 16:25 UTC

    You say 'little-endian' 32 bit values... so unpack('V30', $doc) is a good place to start, because that's 'little-endian' -- unpack('i30', ...) is a signed 'int', whatever size and byte order that might be.

    As noted elsewhere, to get 3759263696 for the first 4 bytes (d0 cf 11 e0) implies these are unsigned 32 bit values. The 'V' pack/unpack form works unsigned. So there's a fit there.

    Of course for -1 we now read 0xFFFF_FFFF.

    If you really want signed integers, then you can convert all unpacked values >= 0x8000_0000. (This much is obvious.) With Perl v5.10 you can use 'l<' to pack/unpack a 32-bit signed little-endian value.

    Update: you can also use 'V!' to pack/unpack a 32-bit signed little-endian value (also Perl v5.10).

    (I don't know of a current machine that doesn't use 2's complement, so we need not worry about the form of -ve numbers !)

Re: To 'pack' or 'unpack' that is the question
by Anonymous Monk on Nov 11, 2008 at 15:30 UTC
    perldoc -f pack
    h A hex string (low nybble first). H A hex string (high nybble first). ... * The "h" and "H" fields pack a string that many nybbles (4-bit groups, representable as hexadecimal digits, 0-9a-f) long. Each byte of the input field of pack() generates 4 bits of the result. For non-alphabetical bytes the result is based on the 4 least-significant bits of the input byte, i.e., on "ord($byte)%16". In particular, bytes "0" and "1" generate nybbles 0 and 1, as do bytes "\0" and "\1". For bytes "a".."f" and "A".."F" the result is compatible with the usual hexadecimal digits, so that "a" and "A" both generate the nybble "0xa==10". The result for bytes "g".."z" and "G".."Z" is not well-defined. Starting from the beginning of the input string of pack(), each pair of bytes is converted to 1 byte of output. With format "h" the first byte of the pair determines the least-significant nybble of the output byte, and with format "H" it determines the most-significant nybble. If the length of the input string is not even, it behaves as if padded by a null byte at the end. Similarly, during unpack()ing the "extra" nybbles are ignored. If the input string of pack() is longer than needed, extra bytes are ignored. A "*" for the repeat count of pack() means to use all the bytes of the input field. On unpack()ing the bits are converted to a string of hexadecimal digits.
    in your code
    my @hex = $hexed =~ /([0-9a-zA-Z]+)/g; for (@hex){ ... pack 'H*', $_; }
    a demonstration
    C:\>echo abcd1234 >test C:\>more test abcd1234 C:\>od -tx1 test 0000000 61 62 63 64 31 32 33 34 20 0d 0a 0000013 C:\>hexdump test 00000000: 61 62 63 64 31 32 33 34 - 20 0D 0A |abcd1234 + | 0000000b; C:\>perl -e"print pack q,H*,, $_ for @ARGV" 61 62 63 64 31 32 33 34 abcd1234