in reply to Re^56: Interleaving bytes in a string quickly
in thread Interleaving bytes in a string quickly

This because the values in the perl string are multi-bytes characters.

No, the values in the string are bytes 0x80 and 0x81 for all the cases.

$_ = 12; # Value is 12, the number of months in a year. print "$_\n"; # Value is still 12, the number of months in a year, # just stored differently.
$_ = "\x80\x81"; # Value is bytes 80 81 utf8::upgrade($_); # Value is still bytes 80 81, # just stored differently. print(length($_), "\n"); # 2

And all it took was 57 levels of exchanges for you to get around to admitting it.

Admit what? That some strings of bytes can be stored using multiple bytes? That was my point from the very beginning.

Replies are listed 'Best First'.
Re^58: Interleaving bytes in a string quickly
by Anonymous Monk on Mar 04, 2010 at 07:00 UTC

    You:No, the values in the string are bytes 0x80 and 0x81 for all the cases.

    Me:One, starts life in perl as a string containing just two values,

    Me:The other starts life in perl as a string containing two values,

    What are you disagreeing with? We both said the same thing.

Re^58: Interleaving bytes in a string quickly
by Anonymous Monk on Mar 04, 2010 at 07:49 UTC

    And you see no contradiction in terms in: bytes can be stored using multiple bytes?

    Values can be stored using multiple bytes. Characters can be stored using multiple bytes.

    But a value is not a byte, if it is two bytes long.

    So get in line with the terminology that the rest of the world is using. Multi-byte values in strings can be characters, codepoints or glyphs. Otherwise you risk being perceived as talking gibberish. (Oh! Where did I read that recently?)

      A byte is an integer with range 0..255

        No. Neither Perl nor C uses the term "byte" to refer to a class of (unsigned) integers. A byte is the minimal addressable unit of memory. For the vast majority of modern, non-specialist cpus, that means 8-bits. And an 8 bit byte can vaiously contain unsigned integers 0-255, or signed integers -127-127, or 8 discrete boolean values.

        Wikipedia says: The byte ... is an ordered collection of bits, in which each bit denotes the binary value of 1 or 0. The size of a byte is typically hardware dependent, but the modern de facto standard is 8 bits, as this is a convenient power of 2.

        perlunicode says: Beginning with version 5.6, Perl uses logically-wide characters to represent strings internally. In future, Perl-level operations will be expected to work with characters rather than bytes. ... Under character semantics, many operations that formerly operated on bytes now operate on characters.

        It makes a whole lot more sense than "a byte can be stored in 1 byte or it can be stored in multiple bytes. But when it is stored in multiple bytes it is still just 1 byte". Of course, you'll probably start talking about that pretentious term 'octet' coined to be unambiguous, but unless your using a 12-bit or 40 bit dsp, there is no ambiguity. And if you are, the term octet is meaningless.

          A reply falls below the community's threshold of quality. You may see it by logging in.