in reply to Re^17: Interleaving bytes in a string quickly
in thread Interleaving bytes in a string quickly

SvPVX() performs NO COERCIONS WHATSOEVER

I know. It just doesn't necessarily return a pointer to the bytes of a byte string.

I therefore invite you to prove your assertion with code!

Well, I don't know of anything that omits the \0, so the remaining two can be shown using:

my $byte_string = "\x80\x81"; dump_sv_pvx($byte_string); utf8::upgrade($byte_string); dump_sv_pvx($byte_string);

Any function or PerlIO layer is free to do format switch, even if the string only contains bytes. It doesn't change the string at all. It's still the same string of bytes.

You didn't specify where the string came from. Maybe it came from lcss, for example, which can switch the internal format. (You discussed using lcss recently, IIRC.) If you need a specific format (and you do), SvPVX without a preceding format check is buggy.

Replies are listed 'Best First'.
Re^19: Interleaving bytes in a string quickly
by BrowserUk (Patriarch) on Mar 01, 2010 at 00:34 UTC

    Gotcha! (at last!).

    After utf8::upgrade($byte_string);, $byte_string is no longer a byte string. It's a character string, or a codepoint string. But not a "byte string". I was (as a result of your previous pedantry), very specific in my choice of title for this thread.

    And, as I said back up there somewhere, "Data either originates from within my program, or from without. And in either case, Perl will treat it as bytes unless I do something explicit to indicate that it should do otherwise. And since I know I'm not going to do that, I do not have to consider it.".

    And, despite your continued attempts to defend it, your assertion that my posted code "...can silently encode your bytes using UTF-8.", is just plain fiction. Any encoding has to be done, explicitly, by the programmer. It cannot occur "silently".

    And, "Magic isn't handled if any is present." is irrelavant!

    So that brings us back to "It can segfault ...". Guess what:

    Ignoring that your attempt to correct your perceived deficiencies in my code, contains

    if (!sv_in || !sv_pad) croak("usage");

    which is a redundant code path that will never be exercised. And this:

    { STRLEN i = l_in;

    which is never used.

    It can also segfault! Try this

    print interleave_bytes( undef, 0 );

    And that's not the only failure mode it displays.

    So, if you're gonna stand on that high horse throwing stones, you really ought to make sure that your mount doesn't have a glass jaw!


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Gotcha! (at last!).

      Gotcha what? I've said the same thing for the beginning.

      After utf8::upgrade($byte_string);, $byte_string is no longer a byte string. It's a character string, or a codepoint string.

      No, it's not. Just like a number can be stored in an IV, an UV, an NV or a PV, a string of bytes is a string of bytes can be stored in a PV w/ UTF8=0 or a PV w/ UTF8=1. It's the same bytes no matter what internal format is used. Perl does not assign meaning to the values inside the string*.

      If you guarantee that you give a string in UTF8=0 format (say by calling utf8::downgrade before calling interleave) you won't have a problem. That hadn't been specified until now. Instead of trying to gotcha me, maybe you should have said what you want to say. Your games aren't fun.

      ...can silently encode your bytes using UTF-8.", is just plain fiction. Any encoding has to be done, explicitly, by the programmer.

      No, it doesn't. I even gave an example.

      if (!sv_in || !sv_pad) croak("usage"); is a redundant code path

      Ah good. I wasn't sure, so I erred on the safe side. In many places in the core, SV* can be NULL.

      STRLEN i = l_in; is never used.

      Thanks, Fixed.

      It can also segfault! Try this print interleave_bytes( undef, 0 );

      Thanks, Fixed by restoring my initial approach on which I had tested that. (The problem is that SvPV(0) is special.)

      So, if you're gonna stand on that high horse throwing stones, you really ought to make sure that your mount doesn't have a glass jaw!

      Pointing out a problem and explaining it when asked does not fit that description.

      And your suggestion that I say I refuse to say anything until I can produce perfect code every time is just ridiculous. I have no problem addressing my problems.

      * — Unless you ask it to. For example, uc will treat the values as unicode characters (regardless of the storage format), and vec will treat the values as bit strings (regardless of the storage format).

      Update: Additions made to address the second have of the parent. I hadn't read it initially figuring it was just "gotcha!". But there were useful comments.

        ...can silently encode your bytes using UTF-8.", .... Any encoding has to be done, explicitly, by the programmer.

        No, it doesn't.

        Yes. It does!

        I even gave an example.

        No. You did not! And you still have not--given an example of encoding occuring "silently".

        Perl does not assign meaning to the values inside the string*.

        What is PV w/ UTF8=0 or a PV w/ UTF8=1., if it is not "Perl ... assigning a meaning"?

        At the C-level, there is no difference; but at the Perl level there most definitely is. And it is at the C-level I am calling SvPVX(). For the very reason I do not want to make any such distinction. The same would not be true if I used SvPVBytes() as you suggested.

        If you guarantee that you give a string in UTF8=0 format (say by calling utf8::downgrade before calling interleave) you won't have a problem.

        You are still getting it arse backward. I don't need to call downgrade(), because I know I'm never going to call upgrade(), or anything else that might cause perl to assign any other meaning than bytes to my data.

        That hadn't been specified until now.

        Oh, but it has. Over and over:

        1. I could use it. I know the data I'd be passing are byte strings. Nothing else makes sense given the purpose of the code.
        2. But given I'm reading the file (binary data) in raw mode, how would they get "silently encoded"?
        3. Data either originates from within my program, or from without. And in either case, Perl will treat it as bytes unless I do something explicit to indicate that it should do otherwise. And since I know I'm not going to do that, I do not have to consider it.
        Your games aren't fun.

        I'm not playing games.. Far from it.

        You asserted: "It can silently encode your bytes using UTF-8.". Since you have demonstrably more knowledge of unicode than I, despite my strong belief that this was impossible, I was unsure enough to ask: Okay. How could it "silently encode my bytes as utf"?. Because if it was true, I wanted to know how.

        But, despite your protestations to the contrary above, you have still not provided an example of how this can happen. I now know this is because you cannot do so. Because it is impossible.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.