in reply to Re^53: Interleaving bytes in a string quickly
in thread Interleaving bytes in a string quickly

No. He didn't.
  • Comment on Re^54: Interleaving bytes in a string quickly

Replies are listed 'Best First'.
Re^55: Interleaving bytes in a string quickly
by ikegami (Patriarch) on Mar 03, 2010 at 18:01 UTC

    "80 81" and "80 81 00" use the single byte format. "C2 80 C2 81" and "C2 80 C2 81 00" use the multi byte format.

    He even said it himself that those aren't all the same format: "You still haven't demonstrated how you can get those four to exist with only two string types".

    All four is the same byte string. Calling any of them character string is weird. Calling all of them character string makes no sense.

      That's not a demonstration.

        I know. I gave trivial instructions on how to generate them elsewhere. Why do you bring it up?

      So, what you are really saying here, finally, is that there are not 4 possibilities, but only two. Because the terminating null bytes are unavoidable and not part of the Perl data.

      And of those two possibilities

      One, starts life in perl as a string containing just two values, and ends up in C as a char array containing just two values.

      And those values are single bytes in both cases. Eg. A byte string in Perl and a char array in C.

      The other starts life in perl as a string containing two values, but ends up in C as a char array containing four values.

      This because the values in the perl string are multi-bytes characters. Two bytes each in this case. Eg. A character string in Perl, and a char array in C.

      I'd call that vindication of Buk's position. And all it took was 57 levels of exchanges for you to get around to admitting it.

        This because the values in the perl string are multi-bytes characters.

        No, the values in the string are bytes 0x80 and 0x81 for all the cases.

        $_ = 12; # Value is 12, the number of months in a year. print "$_\n"; # Value is still 12, the number of months in a year, # just stored differently.
        $_ = "\x80\x81"; # Value is bytes 80 81 utf8::upgrade($_); # Value is still bytes 80 81, # just stored differently. print(length($_), "\n"); # 2

        And all it took was 57 levels of exchanges for you to get around to admitting it.

        Admit what? That some strings of bytes can be stored using multiple bytes? That was my point from the very beginning.