http://qs1969.pair.com?node_id=383881


in reply to Re: pack/unpack 6 bit fields.
in thread pack/unpack 6 bit fields.

To me, a "bit string" means a string composed of the characters "0" and "1", encoding one bit of value per byte (character) of the string. And the author of the pack documentation agrees with me.

Of course, any contiguous data structure could be considered to be "a string of bits" so I much prefer to call such things "base-2" ("representations" or "strings" depending on context), which also conveniently tells us that we have most-significant bit first.

But some people conceive of "bit strings" that aren't base-2 representations, including (out of necessity) the author(s) of pack (though they should have used "b" for "base-2" and "B" for "reversed base-2" instead of the opposite!).

Also note that pack and unpack always start at the first byte (or sometimes word, etc) of the string and work toward the end, so "B32" is great for dealing with big-endian (most-significant byte first) d-words (4-byte unsigned integer values) while "b32" and little-endian multi-byte values usually require awkward insertions of reverse here or there (or both, if only to increase the fun).

"from a 12-byte string" (yes, I added that important hyphen and I'll do it again, just you watch!) tells me we aren't starting with a "bit string" but it doesn't tell me whether we are to treat those 12 bytes as most- or least-significant byte first.

In one respect, we shouldn't have to worry about bit order, unless someone designed a protocol so broken as to pack the more-significant bits of the input (6-bit) fields into the less-significant bits of the bytes in the packed string1 -- in which case they need to be fired and/or publicly chastized.

It also doesn't tell me if we want the 6-bit fields returned with the most- or least-significant field first. A more subtle concern is whether the first 6-bit value is packed into the low 6 bits or the high 6 bits of the first byte.

Alternately, you can combine these byte-order, field-order, and high-/low-first questions together and rephrase them as a question about bit-order and presume that 1) the first 6-bit field gets encoded into the first byte of the packed string and 2) that the same bit-order is used for the 6-bit values and for the 8-bit bytes (else more firings/chastizing).

So the two sane possibilities (illustrated as a choice between bit order) are:

bytes: 765432 10 7654 3210 76 543210 fields: 543210 54 3210 5432 10 543210 or bytes: 012345 67 0123 4567 01 234567 fields: 012345 01 2345 0123 45 012345

Which also hints at how to use un/pack to get either translation (and Errto did a fine job demonstrating one of them).

Another question is whether each 6-bit value should be packed into a one-byte string or be a numeric value. My first guess would be numeric values (Errto guessed packed one-byte strings, perhaps correctly).

Even if we restrict ourselves to one interpretation of the question, there are quite a few ways to go about the task and it is a bit hard to pick between the Ways To Do It.

But here's one way:

my @fields= unpack "C*", # 16 6-bit numeric values pack "B6"x16, # string of 16 bytes, each holding a 6-bit value unpack "a6"x16, # 16 6-character base-2 strings unpack "B*", # 96-character base-2 string "twelve bytes"; # 12-byte packed string my $string= pack "B*", # 12-byte packed string pack "a6"x16, # 96-character base-2 string unpack "B6"x16, # 16 6-character base-2 strings pack "C*", # string of 16 bytes, each holding a 6-bit value @fields; # 16 6-bit numeric values

And you can replace "B*" and "B6" with "b*" and (or!) "b6" to get some less-sane translations.

- tye        

1 Note that I call our 12-byte string a "packed string". Some would call this a "binary string", but "binary" is so horribly ambiguous when dealing with pack that I just avoid using it at all. "Packed" means that the string can contain arbitrary byte values, that it isn't necessarily just "text" a.k.a. "just printable characters".

Replies are listed 'Best First'.
Re^3: pack/unpack 6-bit fields. (precision)
by BrowserUk (Patriarch) on Aug 18, 2004 at 15:20 UTC

    Neat solution...but when using 'B', it produces numbers greater than 6-bits can hold:

    P:\test>perl print join'|', unpack "C*", # 16 6-bit numeric values pack "B6"x16, # string of 16 bytes, each holding a 6-bit value unpack "a6"x16, # 16 6-character base-2 strings unpack "B*", # 96-character base-2 string "twelve bytes"; # 12-byte packed string ^Z 116|28|116|148|108|28|100|148|32|24|36|228|116|24|84|204

    Switching to 'b' fixes that:

    P:\test>perl print join'|', unpack "C*", # 16 6-bit numeric values pack "b6"x16, # string of 16 bytes, each holding a 6-bit value unpack "a6"x16, # 16 6-character base-2 strings unpack "b*", # 96-character base-2 string "twelve bytes"; # 12-byte packed string ^Z 52|29|23|25|44|25|23|25|32|8|22|30|52|21|54|28

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

      Thanks. I keep forgetting how broken un/pack are in this respect. Having "B6" load the high six bits (of the low byte of the integer) is just non-sensical.

      So for the most sane configuration (bits split based on base-2 representation not some "ascending bit order" representation) you have to work harder:

      my @fields= unpack "C*", # 16 6-bit numeric values pack "b6"x16, # string of 16 bytes, each holding a 6-bit value map ''.reverse, # 16 6-character ascending-bit strings unpack "a6"x16, # 16 6-character base-2 strings unpack "B*", # 96-character base-2 string "twelve bytes"; # 12-byte packed string print "(@fields)\n"; my $string= pack "B*", # 12-byte packed string pack "a6"x16, # 96-character base-2 string map ''.reverse, # 16 6-character base-2 strings unpack "b6"x16, # 16 6-character ascending-bit strings pack "C*", # string of 16 bytes, each holding a 6-bit value @fields; # 16 6-bit numeric values print "($string)\n";

      outputs:

      (29 7 29 37 27 7 25 37 8 6 9 57 29 6 21 51) (twelve bytes)

      - tye