in reply to Re^31: Interleaving bytes in a string quickly
in thread Interleaving bytes in a string quickly

First, I misspoke a bit. Perl uses a utf8 internally, a Perl-specific derivative of UTF-8. UTF-8 can only encode values up to 10FFFF and is really meant for unicode characters, while utf8 can encode any UV.

use Devel::Peek qw( Dump ); my $array = ''; for my $bit (0..63) { $array .= chr( 1 << $bit ); } Dump($array);
SV = PV(0x511ae0) at 0x5118b0 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8) PV = 0x531200 "\1\2\4\10\20 @\302\200\304\200\310\200\320\200\340\24 +0\200\341\200\200\342\200\200\344\200\200\350\200\200\360\220\200\200 +\360\240\200\200\361\200\200\200\362\200\200\200\364\200\200\200\370\ +210\200\200\200\370\220\200\200\200\370\240\200\200\200\371\200\200\2 +00\200\372\200\200\200\200\374\204\200\200\200\200\374\210\200\200\20 +0\200\374\220\200\200\200\200\374\240\200\200\200\200\375\200\200\200 +\200\200\376\202\200\200\200\200\200\376\204\200\200\200\200\200\376\ +210\200\200\200\200\200\376\220\200\200\200\200\200\376\240\200\200\2 +00\200\200\377\200\200\200\200\200\201\200\200\200\200\200\200\377\20 +0\200\200\200\200\202\200\200\200\200\200\200\377\200\200\200\200\200 +\204\200\200\200\200\200\200\377\200\200\200\200\200\210\200\200\200\ +200\200\200\377\200\200\200\200\200\220\200\200\200\200\200\200\377\2 +00\200\200\200\200\240\200\200\200\200\200\200\377\200\200\200\200\20 +1\200\200\200\200\200\200\200\377\200\200\200\200\202\200\200\200\200 +\200\200\200\377\200\200\200\200\204\200\200\200\200\200\200\200\377\ +200\200\200\200\210\200\200\200\200\200\200\200\377\200\200\200\200\2 +20\200\200\200\200\200\200\200\377\200\200\200\200\240\200\200\200\20 +0\200\200\200\377\200\200\200\201\200\200\200\200\200\200\200\200\377 +\200\200\200\202\200\200\200\200\200\200\200\200\377\200\200\200\204\ +200\200\200\200\200\200\200\200\377\200\200\200\210\200\200\200\200\2 +00\200\200\200\377\200\200\200\220\200\200\200\200\200\200\200\200\37 +7\200\200\200\240\200\200\200\200\200\200\200\200\377\200\200\201\200 +\200\200\200\200\200\200\200\200\377\200\200\202\200\200\200\200\200\ +200\200\200\200\377\200\200\204\200\200\200\200\200\200\200\200\200\3 +77\200\200\210\200\200\200\200\200\200\200\200\200\377\200\200\220\20 +0\200\200\200\200\200\200\200\200\377\200\200\240\200\200\200\200\200 +\200\200\200\200\377\200\201\200\200\200\200\200\200\200\200\200\200\ +377\200\202\200\200\200\200\200\200\200\200\200\200\377\200\204\200\2 +00\200\200\200\200\200\200\200\200\377\200\210\200\200\200\200\200\20 +0\200\200\200\200"\0 [UTF8 "\x{1}\x{2}\x{4}\x{8}\x{10} @\x{80}\x{100} +\x{200}\x{400}\x{800}\x{1000}\x{2000}\x{4000}\x{8000}\x{10000}\x{2000 +0}\x{40000}\x{80000}\x{100000}\x{200000}\x{400000}\x{800000}\x{100000 +0}\x{2000000}\x{4000000}\x{8000000}\x{10000000}\x{20000000}\x{4000000 +0}\x{80000000}\x{100000000}\x{200000000}\x{400000000}\x{800000000}\x{ +1000000000}\x{2000000000}\x{4000000000}\x{8000000000}\x{10000000000}\ +x{20000000000}\x{40000000000}\x{80000000000}\x{100000000000}\x{200000 +000000}\x{400000000000}\x{800000000000}\x{1000000000000}\x{2000000000 +000}..."] CUR = 504 LEN = 512

Update: First para added.

Replies are listed 'Best First'.
Re^33: Interleaving bytes in a string quickly
by BrowserUk (Patriarch) on Mar 01, 2010 at 17:57 UTC

    Even with your misspeak, that's just a bug in Perl.

    use Devel::Peek;; $a = '';; $a = chr( 65 );; Dump $a;; SV = PV(0x11cfc0) at 0x11f248 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x3d6ccc8 "A"\0 CUR = 1 LEN = 8 $a .= chr( 2**32 );; Dump $a;; SV = PV(0x11cfc0) at 0x11f248 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x3d6cbd8 "A\376\204\200\200\200\200\200"\0Malformed UTF-8 char +acter (byte 0xfe) in subroutine entry [UTF8 "A\x{0}"] CUR = 8 LEN = 16

    It allows you to construct a malformed utf-8 (unicode) string. It shouldn't.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Concerning that warning, it's seen as a tool if you're dealing with unicode characters (although a buggy one atm), one that you can turn off if you're dealing with strings of numbers.

      that's just a bug in Perl.

      No, it's quite intentional.

      If anything, the warning is seen as the bug.

        No, it's quite intentional.

        For what purpose?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.