in reply to Re^6: Bit vector fiddling with Inline C
in thread Bit vector fiddling with Inline C

One real function is to find every set bit in vector A, then set the corresponding bit(s) in vector B

I deduce, therefore, that A is smaller than (or equal to) B.

I would expect you would observe a further siginificant increase in speed if you used the gmp library for that function - either accessing that library via Inline::C, or using one of the existing perl extensions (Math::GMP, Math::GMPz). Using the gmp library in C, the code to perform the above task would be something like:
size = mpz_sizeinbase(A, 2); for(i = 0; i < size; i++) { if(mpz_tstbit(A, i)) mpz_setbit(B,i); }
(and the same approach when using the aforementioned perl modules.)

Of course, you might consider the involvement of the gmp library to be cheating - depending upon the extent to which you want to handle things yourself :-)

Cheers,
Rob

Replies are listed 'Best First'.
Re^8: Bit vector fiddling with Inline C
by BrowserUk (Patriarch) on May 10, 2011 at 07:06 UTC

    Two questions:

    1. Why test the bit and then set it?

      If you are only going to set the bit if it is unset, then just setting it achieves the same thing.

    2. Why would calling a function to set a bit be quicker than setting the bit directly?

      I guess the compiler might inline the function and optimise away the overheads, but still I don't see why it would end up being any quicker. As quick possibly.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Why test the bit and then set it?

      The "tstbit(i)" is being done on vector A, and the "setbit(i)" is being done on vector B - but only when the "tstbit(i)" returns true.
      This is in accordance with the op's description of finding "every set bit in vector A, then set the corresponding bit(s) in vector B".
      Of course, this is a rather naive and inefficient approach. A smarter approach would be to do B = A|B, or in gmp parlance mpz_ior(B, A, B);

      Why would calling a function to set a bit be quicker than setting the bit directly?

      As you suggest, they might not be any quicker, and could even be slower. For single bit operations, you're probably right. (The assembler code on which gmp is generally built could count as something in its favour - but it's not always going to come into play.)
      Personally however, at least where *big* vectors are involved, I'd want to see benchmarks done before ruling out gmp.

      Cheers,
      Rob
        in accordance with the op's description of finding "every set bit in vector A, then set the corresponding bit(s) in vector B".

        Okay. I missed that. Makes sense now.

        A smarter approach would be to do B = A|B, or in gmp parlance mpz_ior(B, A, B);

        'cept that the OP also said that the two vectors aren't in the same order (or even size) and the mapping isn't one to one. But rather, driven by pairs of numbers, a better description might be if( vecA[ 123 ] ) then vecB[ 456 ] = vecB[ 012 ] = 1;

        I'd want to see benchmarks done before ruling out gmp.

        Me too. Though I don't think the size of the vectors will affect the results. Something like:

        int mytest3(SV* sv_vec, unsigned int bit) { STRLEN vecbytes; // Length of vector in bytes unsigned __int64 *vec = (unsigned __int64 *) SvPV(sv_vec, vecbytes +); if( bit / 8 >= vecbytes) return 0; // Check in range vec[ bit / 64] |= ( 1ULL << ( bit % 64 ) ); // Set bit (CHANGES $v +ector) return 1; }

        will handle vectors up to 2 Exabytes with linear efficiency--assuming you had enough memory to hold it :)

        Mind you, if I was going to be processing a list of pairs, then I'd move the acquisition of the vector addresses & lengths out of the loop and use macros for the bit testing and setting:

        #define REGSIZE 64 #define TSTVEC( v, b ) v[ b / REGSIZE ] & ( 1ULL << ( b % REGSIZE ) ) + ) #define SETVEC( v, b ) v[ b / REGSIZE ] |= ( 1ULL << ( b % REGSIZE ) ) + )

        In theory at least, the compiler is more likely to be able to optimise common sub-expression if they do not cross function boundaries. Even in-lined function boundaries.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      On (1) I think he's just illustrating how to iterate/test/set bits via that library. On (2) I see what you mean and will do some tests to compare. Just guessing, but maybe the library method might prove to be quicker if it operates on words rather than bytes.
        Just guessing, but maybe the library method might prove to be quicker if it operates on words rather than bytes.

        I thought that using bigger, and particularly register sized chunks, might make some difference, given that loading/using sub-register sized operands is generally considered to be more expensive. However, I tried addressing the string as an array of both 32-bit and 64-bit ints:

        int mytest2(SV* sv_vec, unsigned int bit) { STRLEN vecbytes; // Length of vector in bytes unsigned int *vec = (unsigned int*) SvPV(sv_vec, vecbytes); if( bit / 8 >= vecbytes) return 0; // Check in range vec[ bit / 32 ] |= ( 1U << ( bit % 32 ) ); // Set bit (CHANGES $ve +ctor) return 1; } int mytest3(SV* sv_vec, unsigned int bit) { STRLEN vecbytes; // Length of vector in bytes unsigned __int64 *vec = (unsigned __int64 *) SvPV(sv_vec, vecbytes +); if( bit / 8 >= vecbytes) return 0; // Check in range vec[ bit / 64] |= ( 1ULL << ( bit % 64 ) ); // Set bit (CHANGES $v +ector) return 1; }

        And whatever difference it made if any, was entirely lost in the noise of benchmarking. The relative ordering ot bytes/dwords/qwords interchange randomly with every run:

        C:\test>903727.pl Rate qwords bytes dwords vec qwords 3.05/s -- -2% -2% -25% bytes 3.13/s 2% -- -0% -23% dwords 3.13/s 2% 0% -- -23% vec 7.70/s 151% 149% 147% -- C:\test>903727.pl Rate qwords bytes dwords vec dwords 3.10/s -- -0% -1% -60% bytes 3.11/s 0% -- -0% -60% qwords 3.13/s 1% 0% -- -59% vec 7.69/s 148% 147% 146% --

        Of course, for best possible performance, you would need to ensure that the pointer was register-size aligned. Perl does allocate strings starting with such alignment, though the SvOOK optimisation can mean that the pointer you receive from SvPV isn't so aligned if the scalar has been fiddled with after allocation, but that is not the case in this benchmark.

        My guess is that optimising compilers already generate code to "unroll the loop" for such accesses, and so this attempt at manual optimisation is unnecessary. I'd like to try and verify that by having the compiler produce the assembler, but every attempt I've made to pass additional compiler options to Inline C cause it to fail to build. Even if CCFLAGS =>'', is still fails, when without that option it succeeds :(


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^8: Bit vector fiddling with Inline C
by oxone (Friar) on May 10, 2011 at 05:56 UTC
    Great tip, thanks. Wasn't aware of that, but will certainly check it out. Currently getting/setting bits byte-wise as in the OP example, which I doubt is the best way. Will definitely test this out as an alternative.