theantler has asked for the wisdom of the Perl Monks concerning the following question:

Hello -gurus .. I think perl's pack/unpack can be abit hard to really understand, even after reading the tutorial that I found here .. Fx. why is char 8-bit not just called ascii atleast in pack/unpack, since that is what the datatype indicates is it not?
$pack = pack ('c4', 65, 66, 67, 68); print "$pack\n"; @h = unpack ('c4', $pack); print "@h\n";
We pack from 8-bit numbers to ascii character representation, and unpack from ascii character to 8-bit number, why is it called char and not ascii? Can someone see how I am confused, and what to do to understand it better? Thanks TA

Replies are listed 'Best First'.
Re: Datatype and pack/unpack
by Corion (Patriarch) on Mar 19, 2010 at 14:43 UTC

    ASCII is a 7-bit charset. If you want to unpack an 8-bit element, calling it "ascii" would be wrong.

Re: Datatype and pack/unpack
by roboticus (Chancellor) on Mar 19, 2010 at 14:56 UTC

    theantler:

    It's not called ASCII because it's not an ASCII character. There's some history involved which makes the data type name a bit misleading. It's much like weight and mass in physics: A pound of mass weighs a pound of weight because we're on the surface of the earth. On the moon, the mass and weight are drastically different.

    In early computer times, we would store a character in a byte. But even then, a character wasn't necessarily ASCII. If you write an 0x40 to most terminals, you'd get an "@" on the screen, because most terminals were ASCII. But if you wrote it to an EBCDIC terminal, you'd see a " " instead.

    Think of it as an eight-bit signed integer rather than a graphic symbol.

    ...roboticus

      Thank you people, for all your responses. Ok, so the "char" is just en eight-bit integer and there is no necessity to what this range of numbers represent, then why does my computer produce ABCDE from the example code?  $pack = pack ('c4', 65, 66, 67, 68); When I "pack" 65 by using a char type, I get A .. WHY WHY WHY .. It is just an eight-bit int why does it HAVE to end as an A?
        As was mentioned above, The mapping of value 65 to the character "A", is just because you're using a computer that's configured for a character encoding that makes this true.

        If your machine was using EBCDIC, the value 65 would map to a Non-breaking Space.

        Your mapping of 65 to "A" is not much of a surprise because many modern character encodings are build as a superset of the 7 bit ASCII encoding. It's not uncommon to see folks using the "Latin" code pages that have this property, as does UTF-8. Any of these choices are common.

Re: Datatype and pack/unpack
by ikegami (Patriarch) on Mar 19, 2010 at 15:07 UTC

    "char" in the pack documentation refers to the type "char" of the C language. It may or may not be as ASCII character. The value may or may not be any kind of character. As far as pack/unpack is concerned, it's just an arbitrary 8-bit value. The value could represent the the number of employees in my company.

    Note that unpack('c', "\x80") and unpack('C', "\x80") (for example) are legit, but neither produces a value that's could be an ASCII character. ASCII only has characters associated with numbers 0 to 127.

    I'm curious as to why you would call them ASCII when you example never deals with the values as characters.

Re: Datatype and pack/unpack
by eff_i_g (Curate) on Mar 19, 2010 at 17:28 UTC