in reply to Re^40: Interleaving bytes in a string quickly
in thread Interleaving bytes in a string quickly

then every other operation that you can apply to a string, contains a bug. That's lots of bugs.

Not even close to every other.

chompworks
chopDoesn't work (returns undef and changes nothing)
chrworks
indexworks
lcworks
lcfirstworks
lengthworks
m//works
ordworks
qq//works
reverseworks
rindexworks
s///works
sprintfworks
substrworks
tr///works
ucworks
ucfirstworks
.works
Hash keyworks

By your definition /\N{}/ and /i aren't features since they can segfault Perl and return bad values for all ops that use the regex engine.

Update: Oops, chomp does work. Fixed.

  • Comment on Re^41: Interleaving bytes in a string quickly

Replies are listed 'Best First'.
Re^42: Interleaving bytes in a string quickly
by BrowserUk (Patriarch) on Mar 03, 2010 at 00:17 UTC
    use Data::Dumper;; $a=''; $a .= chr 1<<$_ for 0 .. 63;; print Dumper $a;; Malformed UTF-8 character (byte 0xfe) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xfe) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xfe) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xfe) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xfe) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xfe) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xfe) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xfe) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xfe) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xfe) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. Malformed UTF-8 character (byte 0xff) in subroutine entry at C:/Perl64 +/lib/Data/Dumper.pm line 190, <STDIN> line 18. $VAR1 = "&#9786;&#9658; \@\x{80}\x{100}\x{200}\x{400}\x{800}\x{1000}\x +{2000}\x{4000}\x{8000}\x{10000}\x{20000}\x{40000}\x{80000}\x{100000}\ +x{200000}\x{400000}\x{800000}\x{1000000}\x{2000000}\x{4000000}\x{8000 +000}\x{10000000}\x{20000000}\x{40000000}

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^42: Interleaving bytes in a string quickly
by BrowserUk (Patriarch) on Mar 03, 2010 at 00:43 UTC
    $a = join '', map ord, 65536, 65, 2**32, 65, 2**48, 65;; print length $a;; 12
Re^42: Interleaving bytes in a string quickly
by BrowserUk (Patriarch) on Mar 03, 2010 at 02:00 UTC

    $a=''; $a.=chr 1<<$_ for 0 .. 63;; print ord unpack 'x[U]x[U]x[U]x[U]x[U]x[U]x[U]x[U]x[U]x[U] U1', $a;; 48
Re^42: Interleaving bytes in a string quickly
by BrowserUk (Patriarch) on Mar 03, 2010 at 00:18 UTC
    use Data::Dump qw[ pp ];; pp $a;; [Malformed UTF-8 character (fatal) at C:/Perl64/lib/Data/Dump.pm line +458, <STDIN> line 20.
Re^42: Interleaving bytes in a string quickly
by BrowserUk (Patriarch) on Mar 03, 2010 at 00:27 UTC
    use String::LCSS_XS qw[ lcss ];; $a=''; $a .= chr 1<<$_ for 0 .. 63;; print lcss( substr( $a, 10, 10 ), $a );; ðÇÓáÇßÇÇÔÇÇõÇÇÞÇÇ­ÉÇÇ­áÇDZÇÇÇ&#8215;ÇÇÇ 0 13
      I'm not sure what you're expecting from printing non-characters. Switching to Dump shows the right output is returned:
      SV = PV(0x98d16d0) at 0x98d4760 REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x99243c0 "\320\200\340\240\200\341\200\200\342\200\200\344\200 +\200\350\200\200\360\220\200\200\360\240\200\200\361\200\200\200\362\ +200\200\200"\0 [UTF8 "\x{400}\x{800}\x{1000}\x{2000}\x{4000}\x{8000}\ +x{10000}\x{20000}\x{40000}\x{80000}"] CUR = 33 LEN = 36

      Note that you need the latest version. 1.0 only supported string of bytes in the 8-bit string format.

        I'm expecting the offset in the second string to be 10, not 13!

        8-bit string format

        BTW: I'm not sure where you got it from, or if it is just a language thing, but that is a nonsensical term. An "8-bit string" would be 1 byte long.

        As is "the 32/64-bit string format". 4 or 8 bytes respectively.

        An '8-bit character string format' maybe. More usually known simply as "a byte string".

        And "32-bit/64-bit character string", though that's still not right because the characters can be "upto nn-bits". But, of course you can't have a character with a non-power of 8 bits.

        So, "varible length character string", but that sounds like the string is variable length rather than the characters. Which I guess is why they are usually referred to as "Unicode strings" or "Wide character strings". But neither of those is quite right for these peculiar, useless beasties.

        So, how about "Variable width character strings". A quick google shows a few other have hit upon that.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^42: Interleaving bytes in a string quickly
by BrowserUk (Patriarch) on Mar 03, 2010 at 00:11 UTC

    length works

    reverse works

    Funny definition of "works":

    $a=''; $a .= chr 1<<$_ for 0 .. 63;; print length $a;; 64 $a = reverse $a;; Malformed UTF-8 character (byte 0xfe) in reverse print length $a;; 75

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I don't get that.
      $ perl -le' > $a=""; $a .= chr 1<<$_ for 0 .. 63; > print length $a; > $a = reverse $a; > print length $a; > ' 64 64
      Maybe it's cause I'm on a 32-bit build? $a contains 0..31,0..31 and the reverse once reversed.