in reply to Re: Bit vector fiddling with Inline C
in thread Bit vector fiddling with Inline C

Thanks for the useful benchmarking info vs Perl's vec(), and my apologies for lack of clarity in the OP. In answer to your queries:

>>If the real code is so complex, why are you asking us to make judgements based on such a trivial example that can never meet its stated goal of greater efficiency?<< I was trying to present a 'minimal case' to illustrate the particular aspects I'm unsure about.

>>The way you've asked the question suggests that you are unsure about the parameter handling rather than the actual internal logic. Where exactly do your doubts lie?<< What I'm unsure about is two things:

(a) Correct method for directly accessing the bytes of a Perl variable from Inline C? My example works but I'm not entirely sure it's the "right way". Is casting the return of SvPV to an unsigned char* a sensible thing to be doing?

(b) Whether directly changing the bytes of a Perl variable in C runs the risk of 'breaking' Perl internals in some scenarios?

  • Comment on Re^2: Bit vector fiddling with Inline C

Replies are listed 'Best First'.
Re^3: Bit vector fiddling with Inline C
by BrowserUk (Patriarch) on May 09, 2011 at 13:47 UTC
    1. Is casting the return of SvPV to an unsigned char* a sensible thing to be doing?

      The union element returned by SvPV is defined as char *

      #define _SV_HEAD_UNION \ union { \ char* svu_pv; /* pointer to malloced string */ \
    2. Whether directly changing the bytes of a Perl variable in C runs the risk of 'breaking' Perl internals in some scenarios?

      What you are doing in the sample code--neither lengthening nor shortening the perl allocated memory, only modifying the bits within it--should be about as safe as it gets.

      There may be some risk of confusing Perl if you performed bitwise operations upon a string that was currently marked as other than bytes--eg. some form of utf.

      Perl might subsequently try to perform some operation upon the PV assuming it contains a valid utf string, which might confuse it, but I wouldn;t expect any dire consequences. I suspect that you could create the same situation by passing a utf encoded scalar to vec.

      The simple answer is don't use utf. (That is, at least don't pass utf strings to the function.)

      Ideally, it would be possible to define a typemap that rejected attempts to pass utf encoded strings, but since perl (along with the rest of the world) has chosen to conflate multiple data formats as a single type, there's not much that can be done in that regard.

      C nor any language has a mechanism for typing arrays of variable length entities, so the world is stuck with this mess until the powers that be see the problems it creates and do something sensible about it.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Thanks so much - that's incredibly helpful, and exactly the kind of feedback I was looking for. It's reassuring to know that my approach isn't likely to break anything (noting your caveat about UTF data).

      I see one of the earlier responses above (also v helpful) raises the question of pass-by-value/reference. My understanding is that with a scalar parameter, Perl normally passes by value (ie. a copy goes onto the stack) whereas in C, strings are always passed around by reference.

      I would assume from my example in the OP that the C world 'wins out' here and the $vector is passed to the C function by reference (even though it's called in Perl with $vector rather than \$vector)? I'm assuming that because changing it in C also changes the original scalar back in Perl.

      That becomes important if $vector happens to be huge - it would otherwise be memcopied as part of the call (which I think is one of the points anonymized user 468275 raises about efficiency).

      I don't suppose your knowledge of the internals can confirm that the example in the OP is indeed just passing a pointer to $vector, and NOT copying the entire byte sequence somewhere else at the same time (even just as a side-effect)?

      Update: After some further tests, this is a duff question. Even in pure Perl, calling a function with a very large scalar as a parameter does not immediately take up twice the memory by copying the variable. (I think that's because an alias to the variable is put onto @_, although I may be wrong?) The 'double the memory' effect only happens if, inside your function, you then assign it to another variable with something like 'my $var = shift'.

        I don't suppose your knowledge of the internals can confirm that the example in the OP is indeed just passing a pointer to $vector, and NOT copying the entire byte sequence somewhere else at the same time (even just as a side-effect)?

        Yes. I can confirm that. No copying is done.

        When you define an XS argument as SV* sv_vec, you are asking for a pointer to the SV. When you operate via that pointer, you are changing the original SV.

        As with ordinary perl subs, the subroutines receives aliases to the actual variables passed:

        sub x{ ++$_ for @_ };; ( $a, $b, $c) = 12345..12347;; x( $a, $b, $c );; print $a, $b, $c;; 12346 12347 12348

        No copying occurs unless the programmer assigns them to local vars:

        sub x{ my( $a, $b, $c ) = @_; ++$_ for $a, $b, $c; }

        If I am defining perl subs to operate upon large scalars, and they are more complex than a couple of lines--at which point the $_[0], $_[1] nomenclature can become awkward--then I will use scalar refs:

        sub xyz (\$) { my $rStr = shift; substr $$rStr, ...; vec $$rStr, ...; ... }

        Which achieves the benefit of named variables without the cost of copying.

        BTW: You still haven't mentioned what the "complex processing" you are performing in XS is?

        I ask because my instinctual reaction that if you are performing boolean operations on whole pairs or more of large bit vectors, it is almost certainly quicker doing it in Perl.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.