in reply to Re^28: Interleaving bytes in a string quickly
in thread Interleaving bytes in a string quickly

<iand SvPVX can return the encoded version.

Gaaaaaaaaah! No it can't. It just returns a pointer to some memory. It places no interpretation upon what it is that memory. And neither does my code.

I don't know why you keep bringing up unicode.

Look again. Prior to the quoted post, I mentioned it once, and you mentioned it once.

That aside, isn't utf-8 a "form of unicode."?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"I'd rather go naked than blow up my ass"
  • Comment on Re^29: Interleaving bytes in a string quickly

Replies are listed 'Best First'.
Re^30: Interleaving bytes in a string quickly
by ikegami (Patriarch) on Mar 01, 2010 at 17:31 UTC

    It places no interpretation upon what it is that memory.

    I know! You've said this a dozen times already. And what format is that pointed memory in? You only guaranteed the format in a node 20 deep or so.

    It places no interpretation upon what it is that memory.

    That aside, isn't utf-8 a "form of unicode."?

    Look again.

    Ah yes, you only said "codepoint", not "unicode". That usually mean "unicode codepoints", but you didn't imply any character semantics.

    That aside, isn't utf-8 a "form of unicode."?

    Unicode is a character set. You're clearly not dealing with characters.

    UTF-8 is a storage format. Typically, it's used to encode unicode characters, but Perl uses it internally to encode 32-bit or 64-bit integers (depending on your build). Those integers may be codepoints, but that applies to UTF8=0 strings too.

      You only guaranteed the format in a node 20 deep

      No. I guarenteed that a) in the title of the thread; b) when I wrote the code.

      UTF-8 is a storage format. Typically, it's used to encode unicode characters, but Perl uses it internally to encode 32-bit integers (or 64-bit on a 64-bit build, I think).

      Please demonstrate. Cos if that is true, it is something that has completely eluded me.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        First, I misspoke a bit. Perl uses a utf8 internally, a Perl-specific derivative of UTF-8. UTF-8 can only encode values up to 10FFFF and is really meant for unicode characters, while utf8 can encode any UV.

        use Devel::Peek qw( Dump ); my $array = ''; for my $bit (0..63) { $array .= chr( 1 << $bit ); } Dump($array);
        SV = PV(0x511ae0) at 0x5118b0 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8) PV = 0x531200 "\1\2\4\10\20 @\302\200\304\200\310\200\320\200\340\24 +0\200\341\200\200\342\200\200\344\200\200\350\200\200\360\220\200\200 +\360\240\200\200\361\200\200\200\362\200\200\200\364\200\200\200\370\ +210\200\200\200\370\220\200\200\200\370\240\200\200\200\371\200\200\2 +00\200\372\200\200\200\200\374\204\200\200\200\200\374\210\200\200\20 +0\200\374\220\200\200\200\200\374\240\200\200\200\200\375\200\200\200 +\200\200\376\202\200\200\200\200\200\376\204\200\200\200\200\200\376\ +210\200\200\200\200\200\376\220\200\200\200\200\200\376\240\200\200\2 +00\200\200\377\200\200\200\200\200\201\200\200\200\200\200\200\377\20 +0\200\200\200\200\202\200\200\200\200\200\200\377\200\200\200\200\200 +\204\200\200\200\200\200\200\377\200\200\200\200\200\210\200\200\200\ +200\200\200\377\200\200\200\200\200\220\200\200\200\200\200\200\377\2 +00\200\200\200\200\240\200\200\200\200\200\200\377\200\200\200\200\20 +1\200\200\200\200\200\200\200\377\200\200\200\200\202\200\200\200\200 +\200\200\200\377\200\200\200\200\204\200\200\200\200\200\200\200\377\ +200\200\200\200\210\200\200\200\200\200\200\200\377\200\200\200\200\2 +20\200\200\200\200\200\200\200\377\200\200\200\200\240\200\200\200\20 +0\200\200\200\377\200\200\200\201\200\200\200\200\200\200\200\200\377 +\200\200\200\202\200\200\200\200\200\200\200\200\377\200\200\200\204\ +200\200\200\200\200\200\200\200\377\200\200\200\210\200\200\200\200\2 +00\200\200\200\377\200\200\200\220\200\200\200\200\200\200\200\200\37 +7\200\200\200\240\200\200\200\200\200\200\200\200\377\200\200\201\200 +\200\200\200\200\200\200\200\200\377\200\200\202\200\200\200\200\200\ +200\200\200\200\377\200\200\204\200\200\200\200\200\200\200\200\200\3 +77\200\200\210\200\200\200\200\200\200\200\200\200\377\200\200\220\20 +0\200\200\200\200\200\200\200\200\377\200\200\240\200\200\200\200\200 +\200\200\200\200\377\200\201\200\200\200\200\200\200\200\200\200\200\ +377\200\202\200\200\200\200\200\200\200\200\200\200\377\200\204\200\2 +00\200\200\200\200\200\200\200\200\377\200\210\200\200\200\200\200\20 +0\200\200\200\200"\0 [UTF8 "\x{1}\x{2}\x{4}\x{8}\x{10} @\x{80}\x{100} +\x{200}\x{400}\x{800}\x{1000}\x{2000}\x{4000}\x{8000}\x{10000}\x{2000 +0}\x{40000}\x{80000}\x{100000}\x{200000}\x{400000}\x{800000}\x{100000 +0}\x{2000000}\x{4000000}\x{8000000}\x{10000000}\x{20000000}\x{4000000 +0}\x{80000000}\x{100000000}\x{200000000}\x{400000000}\x{800000000}\x{ +1000000000}\x{2000000000}\x{4000000000}\x{8000000000}\x{10000000000}\ +x{20000000000}\x{40000000000}\x{80000000000}\x{100000000000}\x{200000 +000000}\x{400000000000}\x{800000000000}\x{1000000000000}\x{2000000000 +000}..."] CUR = 504 LEN = 512

        Update: First para added.