in reply to bit pattern to bit position mapping

I take it you've profiled your application, which is experiencing noticeable performance issues, and determined that it's spending a lot of its time in this subroutine? Of course you have, and it's silly of me to even ask. Obviously you profiled your application, because you wouldn't otherwise be asking for a faster way to do this little thing, because you know full well that premature optimization is a root of all kinds of evil, for which some, having strayed, have pierced themselves through with many sorrows. You know this, and therefore you are only asking about a faster way to code this particular subroutine because you have established that its speed actually matters. Right? Right?


Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. You can just call me "Mister Sanity". Why, I've got so much sanity it's driving me crazy.
  • Comment on Re: bit pattern to bit position mapping

Replies are listed 'Best First'.
Re^2: bit pattern to bit position mapping
by Anonymous Monk on Nov 19, 2006 at 13:51 UTC

    Yes. Yes. Yes!

    The application calls this routine 270 million times and it is currently consuming 14% of the total runtime.

    This isn't premature.

      it is currently consuming 14% of the total runtime

      oh, well why didn't you say so in the first place? Rewrite the hot function in C, and use array references, to stop slinging memory around.

      use strict; use Inline C => <<'__C_CODE__'; SV* b2i(int value) { AV *out; int i; out = (AV *)sv_2mortal((SV *)newAV()); i = 0; while (value > 0) { if (value & 1) av_push(out, newSViv(i)); ++i; value >>= 1; } return((SV *)newRV((SV *)out)); } __C_CODE__ my $ridx = b2i(12345); print "@$ridx\n";

      What does that do to the run time?

      • another intruder with the mooring in the heart of the Perl

        Thanks for this. It prompted me to investigate using Inline::C and was the starting point for my chosen solution.

        The results of benchmarking various implementations in Perl and Inline::C are

        # a few bits grouped in the LSBs of the unsigned long Rate 1 4 3 2 4b 1b 4a 1a 7 6 5 7a 1 349/s -- -17% -47% -50% -52% -53% -53% -55% -75% -76% -81% -86% 4 420/s 20% -- -36% -40% -42% -43% -44% -46% -70% -71% -77% -83% 3 657/s 88% 56% -- -6% -9% -11% -12% -15% -53% -55% -64% -74% 2 696/s 99% 66% 6% -- -4% -6% -6% -10% -50% -52% -62% -72% 4b 724/s 108% 73% 10% 4% -- -2% -3% -6% -48% -50% -61% -71% 1b 740/s 112% 76% 13% 6% 2% -- -1% -4% -47% -49% -60% -71% 4a 745/s 113% 77% 13% 7% 3% 1% -- -4% -46% -49% -59% -71% 1a 772/s 121% 84% 18% 11% 7% 4% 4% -- -44% -47% -58% -70% 7 1386/s 297% 230% 111% 99% 91% 87% 86% 80% -- -4% -24% -45% 6 1449/s 315% 245% 121% 108% 100% 96% 95% 88% 5% -- -21% -43% 5 1834/s 425% 337% 179% 163% 153% 148% 146% 138% 32% 27% -- -28% 7a 2532/s 625% 503% 285% 264% 249% 242% 240% 228% 83% 75% 38% -- # a few bits grouped in the MSBs of the unsigned long Rate 1b 1a 4b 4a 2 1 4 5 3 6 7 7a 1b 256/s -- -2% -4% -7% -9% -24% -39% -60% -61% -81% -81% -90% 1a 260/s 2% -- -2% -5% -8% -23% -38% -59% -61% -81% -81% -90% 4b 266/s 4% 2% -- -3% -5% -21% -36% -58% -60% -80% -80% -90% 4a 274/s 7% 5% 3% -- -3% -19% -34% -57% -59% -80% -80% -89% 2 282/s 10% 8% 6% 3% -- -17% -33% -56% -57% -79% -79% -89% 1 338/s 32% 30% 27% 23% 20% -- -19% -47% -49% -75% -75% -87% 4 417/s 63% 60% 57% 52% 48% 23% -- -35% -37% -69% -69% -84% 5 638/s 149% 146% 140% 133% 127% 89% 53% -- -4% -53% -53% -75% 3 662/s 159% 155% 149% 142% 135% 96% 59% 4% -- -51% -51% -74% 6 1353/s 429% 421% 408% 394% 381% 300% 224% 112% 104% -- -1% -47% 7 1361/s 432% 424% 411% 397% 383% 303% 226% 113% 105% 1% -- -47% 7a 2576/s 906% 891% 867% 841% 815% 662% 517% 304% 289% 90% 89% --
        1. is the original grep solution I posted.

          1a) & 1b) are minor variations.

        2. is a Perl implementation of your C algorithm.
        3. is 1 with the loop unrolled.
        4. (& 4a & 4b) use grep with pack and vec.
        5. is a custom evaled Perl subroutine that unrolls the grep loop and uses log2( value ) to avoid testing the higher bits.

          This performs surprisingly well for small numbers of bits in the LSBs of the scalar.

        6. is your Inline::C implementation.
        7. is an Inline::C version that uses (a custom C) log2( value ) to skip over the high bits.

          7a is the above, but pushing the results on the XS stack rather than creating and filling a AV* which has to be dereferenced in the calling code.

        The affect of that 6 to 9 times improvement in the performance of this relatively simple code is that 14% has dropped to 1% to 2%, and so cut 2 1/2 to 3 hours off the overall runtime. Many thanks.

        My chosen solution:

        use Inline C => <<'__C_CODE__'; static const char LogTable256[] = { 0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7 }; unsigned log2i( unsigned int v ) { // 32-bit word to find the log of register unsigned int t, tt; // temporaries if (tt = v >> 16) { return( ( t = tt >> 8 ) ? 24 + LogTable256[ t ] : 16 + LogTable256[ tt ] ); } else { return( ( t = v >> 8 ) ? 8 + LogTable256[ t ] : LogTable256[ v ] ); } } void b2i7a( unsigned int value ) { register int i = log2i( (value-1) ^ value ); Inline_Stack_Vars; Inline_Stack_Reset; for( value >>= i; value; i++, value >>= 1 ) if( value & 1 ) Inline_Stack_Push( sv_2mortal( newSVuv( i ) ) ); Inline_Stack_Done; return; } __C_CODE__

        Comments, improvements & warnings welcomed.

      This isn't premature.

      I'd like to apologise for the tone of my earlier comment. Indeed, this does sound like it would be worth significant optimization effort.


      Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. You can just call me "Mister Sanity". Why, I've got so much sanity it's driving me crazy.

      s/270/27/.