in reply to Re^41: Interleaving bytes in a string quickly
in thread Interleaving bytes in a string quickly

use String::LCSS_XS qw[ lcss ];; $a=''; $a .= chr 1<<$_ for 0 .. 63;; print lcss( substr( $a, 10, 10 ), $a );; ðÇÓáÇßÇÇÔÇÇõÇÇÞÇÇ­ÉÇÇ­áÇDZÇÇÇ&#8215;ÇÇÇ 0 13

Replies are listed 'Best First'.
Re^43: Interleaving bytes in a string quickly
by ikegami (Patriarch) on Mar 03, 2010 at 00:45 UTC
    I'm not sure what you're expecting from printing non-characters. Switching to Dump shows the right output is returned:
    SV = PV(0x98d16d0) at 0x98d4760 REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x99243c0 "\320\200\340\240\200\341\200\200\342\200\200\344\200 +\200\350\200\200\360\220\200\200\360\240\200\200\361\200\200\200\362\ +200\200\200"\0 [UTF8 "\x{400}\x{800}\x{1000}\x{2000}\x{4000}\x{8000}\ +x{10000}\x{20000}\x{40000}\x{80000}"] CUR = 33 LEN = 36

    Note that you need the latest version. 1.0 only supported string of bytes in the 8-bit string format.

      I'm expecting the offset in the second string to be 10, not 13!

        2^10 = 0x400, so you got the right char.
      8-bit string format

      BTW: I'm not sure where you got it from, or if it is just a language thing, but that is a nonsensical term. An "8-bit string" would be 1 byte long.

      As is "the 32/64-bit string format". 4 or 8 bytes respectively.

      An '8-bit character string format' maybe. More usually known simply as "a byte string".

      And "32-bit/64-bit character string", though that's still not right because the characters can be "upto nn-bits". But, of course you can't have a character with a non-power of 8 bits.

      So, "varible length character string", but that sounds like the string is variable length rather than the characters. Which I guess is why they are usually referred to as "Unicode strings" or "Wide character strings". But neither of those is quite right for these peculiar, useless beasties.

      So, how about "Variable width character strings". A quick google shows a few other have hit upon that.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        More usually known simply as "a byte string".

        A "byte string" means the same as a "string of bytes" to me (flour box = box of flour), and that's not what I meant by "8-bit string format". I was referring to one of Perl's string format, not what the value of the string.