in reply to Bit vector fiddling with Inline C

(Warning:very limited expertise.)

Nothing obviously wrong leaps out from what you've posted.

But that isn't going to be any quicker to use than the equivalent Perl code:

vec( $vector, $bit, 1 ) ||= 1;

A quick test shows that it is considerably slower:

use strict; use warnings; use Benchmark qw[ cmpthese ]; use Inline C => 'DATA'; my( $vec1, $vec2 ) = ( ( chr(0) x 125000 ) x 2 ); cmpthese -3, { inline => sub { mytest( $vec1, $_ ) for 0 .. 1e6-1; }, vec => sub { vec( $vec2, $_, 1 ) ||= 1 for 0 .. 1e6-1; }, }; warn "Different results" unless $vec1 eq $vec2; __DATA__ __C__ int mytest(SV* sv_vec, unsigned int bit) { STRLEN vecbytes; // Length of vector in bytes unsigned char *myvec = (unsigned char *) SvPV(sv_vec, vecbytes); if (bit/8 >= vecbytes) return 0; // Check in range if (myvec[bit/8] & 1U<<(bit%8)) return 1; // Test if a bit is set myvec[bit/8] |= 1U<<(bit%8); // Set bit (CHANGES $vector) return 1; }

Results:

C:\test>903727.pl Rate inline vec inline 3.11/s -- -60% vec 7.77/s 150% -- C:\test>903727.pl Rate inline vec inline 3.10/s -- -60% vec 7.77/s 151% --

You mention in a later post that "The real case applies some fairly complex logic to multiple large bit vectors (each 1m+ bits) which runs a lot faster in C.". That raises a couple of questions:

Update: There is also the question of why you are setting the bit conditionally? That is, your calling code will never be able to tell the difference between the situation where the bit was previously unset; and when it was already set.

So why bother testing if it is set and not just set it?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Bit vector fiddling with Inline C
by oxone (Friar) on May 09, 2011 at 12:54 UTC

    Thanks for the useful benchmarking info vs Perl's vec(), and my apologies for lack of clarity in the OP. In answer to your queries:

    >>If the real code is so complex, why are you asking us to make judgements based on such a trivial example that can never meet its stated goal of greater efficiency?<< I was trying to present a 'minimal case' to illustrate the particular aspects I'm unsure about.

    >>The way you've asked the question suggests that you are unsure about the parameter handling rather than the actual internal logic. Where exactly do your doubts lie?<< What I'm unsure about is two things:

    (a) Correct method for directly accessing the bytes of a Perl variable from Inline C? My example works but I'm not entirely sure it's the "right way". Is casting the return of SvPV to an unsigned char* a sensible thing to be doing?

    (b) Whether directly changing the bytes of a Perl variable in C runs the risk of 'breaking' Perl internals in some scenarios?

      1. Is casting the return of SvPV to an unsigned char* a sensible thing to be doing?

        The union element returned by SvPV is defined as char *

        #define _SV_HEAD_UNION \ union { \ char* svu_pv; /* pointer to malloced string */ \
      2. Whether directly changing the bytes of a Perl variable in C runs the risk of 'breaking' Perl internals in some scenarios?

        What you are doing in the sample code--neither lengthening nor shortening the perl allocated memory, only modifying the bits within it--should be about as safe as it gets.

        There may be some risk of confusing Perl if you performed bitwise operations upon a string that was currently marked as other than bytes--eg. some form of utf.

        Perl might subsequently try to perform some operation upon the PV assuming it contains a valid utf string, which might confuse it, but I wouldn;t expect any dire consequences. I suspect that you could create the same situation by passing a utf encoded scalar to vec.

        The simple answer is don't use utf. (That is, at least don't pass utf strings to the function.)

        Ideally, it would be possible to define a typemap that rejected attempts to pass utf encoded strings, but since perl (along with the rest of the world) has chosen to conflate multiple data formats as a single type, there's not much that can be done in that regard.

        C nor any language has a mechanism for typing arrays of variable length entities, so the world is stuck with this mess until the powers that be see the problems it creates and do something sensible about it.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Thanks so much - that's incredibly helpful, and exactly the kind of feedback I was looking for. It's reassuring to know that my approach isn't likely to break anything (noting your caveat about UTF data).

        I see one of the earlier responses above (also v helpful) raises the question of pass-by-value/reference. My understanding is that with a scalar parameter, Perl normally passes by value (ie. a copy goes onto the stack) whereas in C, strings are always passed around by reference.

        I would assume from my example in the OP that the C world 'wins out' here and the $vector is passed to the C function by reference (even though it's called in Perl with $vector rather than \$vector)? I'm assuming that because changing it in C also changes the original scalar back in Perl.

        That becomes important if $vector happens to be huge - it would otherwise be memcopied as part of the call (which I think is one of the points anonymized user 468275 raises about efficiency).

        I don't suppose your knowledge of the internals can confirm that the example in the OP is indeed just passing a pointer to $vector, and NOT copying the entire byte sequence somewhere else at the same time (even just as a side-effect)?

        Update: After some further tests, this is a duff question. Even in pure Perl, calling a function with a very large scalar as a parameter does not immediately take up twice the memory by copying the variable. (I think that's because an alias to the variable is put onto @_, although I may be wrong?) The 'double the memory' effect only happens if, inside your function, you then assign it to another variable with something like 'my $var = shift'.