in reply to Re^21: Interleaving bytes in a string quickly
in thread Interleaving bytes in a string quickly

What is PV w/ UTF8=0 or a PV w/ UTF8=1., if it is not "Perl ... assigning a meaning"?

uint8 i = 2; uint16 j = i;

Does the second assignment change the meaning of 2? No. It's still the number of cats my friends has.

my $i = "\x82"; utf8::upgrade( my $j = $i );

Does the second assignment change the meaning of "\x82"? No. It's still the HDMI code for something or other.

because I know I'm never going to call upgrade(), or anything else that might cause perl to assign any other meaning than bytes to my data.

Presumably you mean "anything else that might cause perl to use the 32/64-bit string format", then why didn't you say so?

It's a very weird assumption to make, but you're free to make all the assumptions you want.

Replies are listed 'Best First'.
Re^23: Interleaving bytes in a string quickly
by BrowserUk (Patriarch) on Mar 01, 2010 at 14:20 UTC
    uint8  i = 2; uint16 j = i; Does the second assignment change the meaning of 2?

    Cheap theatrics. Cos now int i = -1; uint j = i; it does.

    Presumably you mean "anything else that might cause perl to use the 32/64-bit string format",

    Nope! I meant exactly what I said.

    Paraphrase: If I populate the PV of a perl scalar with byte values, and later call SvPVX() to gain access to those byte values, I will get back exactly what I put in them...unless in the interim, I explicitly do something to change them. No presumptions or assumptions. Simply fact.

    And using SvPVX() will NEVER cause it to

    silently encode your bytes using UTF-8.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Cheap theatrics. Cos now int i = -1; uint j = i; it does.

      I never said or implied that the types are equivalent. Different types have different range, precision and costs.

      32/64-bit strings have a wider range, but they are more costly in every respect. (That's why we have 8-bit strings at all.) The range of 32/64-bit strings encompasses the range of 8-bit strings, so your signed vs unsigned analogy is broken.

      Nope! I meant exactly what I said.

      If you meant what you said, you haven't ruled out doing something like $pascal_bytes = chr(length($bytes)) . $bytes;. For long enough strings, it upgrades without calling upgrade and without Perl assigning any meaning to the string.

        I never said or implied that the types are equivalent.

        No. You suggested that assigning from one type to another doesn't change the meaning of the value contained. Except, it often does.

        The rest is just another diversionary tactic from the subject from the discussion:

        Using SvPVX() will NEVER cause it to silently encode your bytes using UTF-8.

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        And now to your belated and unannounced update.

        you haven't ruled out doing something like $pascal_bytes = chr(length($bytes)) . $bytes;. For long enough strings, it upgrades without calling upgrade and without Perl assigning any meaning to the string.

        That is wrong in so many ways.

        First, I conclude that you are referring to this:

        C:\test>p1 use Devel::Peek;; sub toPascal{ return chr( length $_[0] ) . $_[0] };; Dump toPascal( 'x' x $_ ) for 255, 256;; SV = PV(0x28d280) at 0x3c599e0 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0x3cb28f8 "\377xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +xxxxxxxxxxxxx CUR = 256 LEN = 264 SV = PV(0x3c3aa30) at 0x3c599c8 REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x3cb2a28 "\304\200xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +xxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"] CUR = 258 LEN = 264

        Meaning has been assigned. Note the UTF8 flag is set in the second example.

        But then, it makes no sense to try and create Pascal strings longer than 255 bytes, because P-strings are defined as being prefixed with a length byte.

        But even if that where not the case--if length could be of any size--using SvPVbytes() wouldn't help any. Because unless the XS routine was specifically designed to work with this non-native encoding, its never going to do the right thing:

        SV *func( SV *first, SV *second ) { return sv_catsv( first, second ); }

        Just another red-herring.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.