in reply to Re^54: Interleaving bytes in a string quickly
in thread Interleaving bytes in a string quickly

"80 81" and "80 81 00" use the single byte format. "C2 80 C2 81" and "C2 80 C2 81 00" use the multi byte format.

He even said it himself that those aren't all the same format: "You still haven't demonstrated how you can get those four to exist with only two string types".

All four is the same byte string. Calling any of them character string is weird. Calling all of them character string makes no sense.

  • Comment on Re^55: Interleaving bytes in a string quickly

Replies are listed 'Best First'.
Re^56: Interleaving bytes in a string quickly
by Anonymous Monk on Mar 03, 2010 at 20:57 UTC

    That's not a demonstration.

      I know. I gave trivial instructions on how to generate them elsewhere. Why do you bring it up?

        If it is so trivial, why not demonstrate that too?

Re^56: Interleaving bytes in a string quickly
by Anonymous Monk on Mar 04, 2010 at 04:54 UTC

    So, what you are really saying here, finally, is that there are not 4 possibilities, but only two. Because the terminating null bytes are unavoidable and not part of the Perl data.

    And of those two possibilities

    One, starts life in perl as a string containing just two values, and ends up in C as a char array containing just two values.

    And those values are single bytes in both cases. Eg. A byte string in Perl and a char array in C.

    The other starts life in perl as a string containing two values, but ends up in C as a char array containing four values.

    This because the values in the perl string are multi-bytes characters. Two bytes each in this case. Eg. A character string in Perl, and a char array in C.

    I'd call that vindication of Buk's position. And all it took was 57 levels of exchanges for you to get around to admitting it.

      This because the values in the perl string are multi-bytes characters.

      No, the values in the string are bytes 0x80 and 0x81 for all the cases.

      $_ = 12; # Value is 12, the number of months in a year. print "$_\n"; # Value is still 12, the number of months in a year, # just stored differently.
      $_ = "\x80\x81"; # Value is bytes 80 81 utf8::upgrade($_); # Value is still bytes 80 81, # just stored differently. print(length($_), "\n"); # 2

      And all it took was 57 levels of exchanges for you to get around to admitting it.

      Admit what? That some strings of bytes can be stored using multiple bytes? That was my point from the very beginning.

        You:No, the values in the string are bytes 0x80 and 0x81 for all the cases.

        Me:One, starts life in perl as a string containing just two values,

        Me:The other starts life in perl as a string containing two values,

        What are you disagreeing with? We both said the same thing.

        And you see no contradiction in terms in: bytes can be stored using multiple bytes?

        Values can be stored using multiple bytes. Characters can be stored using multiple bytes.

        But a value is not a byte, if it is two bytes long.

        So get in line with the terminology that the rest of the world is using. Multi-byte values in strings can be characters, codepoints or glyphs. Otherwise you risk being perceived as talking gibberish. (Oh! Where did I read that recently?)