in reply to problems packing 8 bit data

I am verifying it via:

perl -e 'print pack("C*", 0x7e..0x81);' | od -tx1

And its not a seperator thing, because adding the extra character (0x7e) you don't get the 0xc2 between every byte, only preceeding the bytes above 0x80.

And yes, I'm currently using perl 5.8.0 on RH 9.0

Replies are listed 'Best First'.
Re: Re: problems packing 8 bit data
by BrowserUk (Patriarch) on Aug 15, 2003 at 18:06 UTC

    I willing to bet that the utf-ifying occurs when you print it, not when you pack it. Try assigning to a variable and then print the length.

    Why print would utf-ify the output will come down to what PerlIO 'layers' (eg. :raw, :utf etc.) are used on STDOUT, which I know nothing about beyond that they exist. Take a look at perliol.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
    If I understand your problem, I can solve it! Of course, the same can be said for you.

      After reviewing your response, and the testing I've been doing in between,I have come to the conclusion, that I think I agree with you.

      The string is correct, its the output interpretation that needs to be affected. The binmode does alter the resultant output.

      Now if someone could just explain what translation is really going on here in this particular bit pattern example.
      ie. a 0x80 is actually represented as a 0xc2 0x80, because ???.

        As John M. Dlugosz says above, it is the UTF-8 encoding of the (ISO-8859-1) string.

        ...because your copy of perl has been built with perlIO, and STDOUT has--by default I assume as you don't know about it--IO layer :utf enabled.

        This means that characters above the 7-bit ascii limit are converted to their utf (16-bit) representation on output. Using binmode on STDOUT removes this layer and so the convertion is not done.

        Looking at perldelta, I cannot find anything to indicate that STDOUT would be opened :utf by default, but if you aren't doing anything to enable this yourself, I assume it must be?

        See the section of Perldelta headed "PerlIO is now the default" and follow the various links close to the bottom of the section for more information.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
        If I understand your problem, I can solve it! Of course, the same can be said for you.