coldr3ality has asked for the wisdom of the Perl Monks concerning the following question:

Blessed Monks, I humbly prithee guidance bestow. I'm looking to benchmark for a variant of the binary search algorithm, but before I begin I'm trying to bake it down to the most efficient Perl code I possibly can. I'm looking for the most efficient way to find the Most Significant Set Bit of a given integer value. I am unaware of a Perl function that does this. I have almost no experience in C, but I read that GCC does have an algorithm for this. Up to now, I have been using the following code to find the most significant set bit of a given 64-bit integer. I did my best to format it legibly here; though it looks like puke, it's a single statement. Is there a better performing solution?
$m is assigned the most significant set bit of 64-bit integer $z, for +the cost of 6 ANDs and 6 truth tests. my $m=-4294967296&$z? -281474976710656&$z? -72057594037927936&$z? -1152921504606846976&$z? -4611686018427387904&$z? -9223372036854775 +808&$z? 4611686018427387904: 2305843009213693952 : -2305843009213693952&$z? 1152921504606846976: 576460752303423488 : -288230376151711744&$z? -576460752303423488&$z? 288230376151711744: 144115188075855872 : -144115188075855872&$z? 72057594037927936: 36028797018963968 : -4503599627370496&$z? -18014398509481984&$z? -36028797018963968& +$z? 18014398509481984: 9007199254740992 : -9007199254740992&$z? 4503599627370496: 2251799813685248 : -1125899906842624&$z? -2251799813685248&$z? 1125899906842624: 562949953421312 : -562949953421312&$z? 281474976710656: 140737488355328 : -1099511627776&$z? -17592186044416&$z? -70368744177664&$z? -1407374 +88355328&$z? 70368744177664: 35184372088832 : -35184372088832&$z? 17592186044416: 8796093022208 : -4398046511104&$z? -8796093022208&$z? 4398046511104: 2199023255552 : -2199023255552&$z? 1099511627776: 549755813888 : -68719476736&$z? -274877906944&$z? -549755813888&$z? 274877906944: 137438953472 : -137438953472&$z? 68719476736: 34359738368 : -17179869184&$z? -34359738368&$z? 17179869184: 8589934592 : -8589934592&$z? 4294967296: 2147483648 : -65536&$z? -16777216&$z? -268435456&$z? -1073741824&$z? -2147483648& +$z? 1073741824: 536870912 : -536870912&$z? 268435456: 134217728 : -67108864&$z? -134217728&$z? 67108864: 33554432 : -33554432&$z? 16777216: 8388608 : -1048576&$z? -4194304&$z? -8388608&$z? 4194304: 2097152 : -2097152&$z? 1048576: 524288 : -262144&$z? -524288&$z? 262144: 131072 : -131072&$z? 65536: 32768 : -256&$z? -4096&$z? -16384&$z? -32768&$z? 16384: 8192 : -8192&$z? 4096: 2048 : -1024&$z? -2048&$z? 1024: 512 : -512&$z? 256: 128 : -16&$z? -64&$z? -128&$z? 64: 32 : -32&$z? 16: 8 : -4&$z? -8&$z? 4: 2 : 2 ;
EDIT 2024/3/25: Updating for completeness.

As bliako generously published GCC::Builtins in response to this question, and a benchmark in the documentation refers back to this code for comparison, I've posted an updated version below which is more directly comparable. The code I originally posted above is exactly as slow, but the output constants represented the answer in different terms to clz(). The new code is just for comparing apples to apples (and heeding some legibility advice from hv).

I was surprised that GCC::Builtins::clz is only twice as fast, considering the essential part is just one CPU instruction (LZCNT) which should take practically no time at all. I guess most of that time is spent on the call stack and related.

my $z=17; my $m; { no warnings; ## to suppress warning ## "Hexadecimal number > 0xffffffff non-portable" $m= 0xFFFFFFFF80000000&$z? 0xFFFF800000000000&$z? 0xFF80000000000000&$z? 0xF800000000000000&$z? 0xE000000000000000&$z? 0xC000000000000000&$z? 0x8000000000000000&$z? 0 : 1: 2 : 0xF000000000000000&$z? 3: 4 : 0xFE00000000000000&$z? 0xFC00000000000000&$z? 5: 6 : 0xFF00000000000000&$z? 7: 8 : 0xFFF8000000000000&$z? 0xFFE0000000000000&$z? 0xFFC0000000000000&$z? 9: 10 : 0xFFF0000000000000&$z? 11: 12 : 0xFFFE000000000000&$z? 0xFFFC000000000000&$z? 13: 14 : 0xFFFF000000000000&$z? 15: 16 : 0xFFFFFF8000000000&$z? 0xFFFFF80000000000&$z? 0xFFFFE00000000000&$z? 0xFFFFC00000000000&$z? 17: 18 : 0xFFFFF00000000000&$z? 19: 20 : 0xFFFFFE0000000000&$z? 0xFFFFFC0000000000&$z? 21: 22 : 0xFFFFFF0000000000&$z? 23: 24 : 0xFFFFFFF800000000&$z? 0xFFFFFFE000000000&$z? 0xFFFFFFC000000000&$z? 25: 26 : 0xFFFFFFF000000000&$z? 27: 28 : 0xFFFFFFFE00000000&$z? 0xFFFFFFFC00000000&$z? 29: 30 : 0xFFFFFFFF00000000&$z? 31: 32 : 0xFFFFFFFFFFFF8000&$z? 0xFFFFFFFFFF800000&$z? 0xFFFFFFFFF8000000&$z? 0xFFFFFFFFE0000000&$z? 0xFFFFFFFFC0000000&$z? 33: 34 : 0xFFFFFFFFF0000000&$z? 35: 36 : 0xFFFFFFFFFE000000&$z? 0xFFFFFFFFFC000000&$z? 37: 38 : 0xFFFFFFFFFF000000&$z? 39: 40 : 0xFFFFFFFFFFF80000&$z? 0xFFFFFFFFFFE00000&$z? 0xFFFFFFFFFFC00000&$z? 41: 42 : 0xFFFFFFFFFFF00000&$z? 43: 44 : 0xFFFFFFFFFFFE0000&$z? 0xFFFFFFFFFFFC0000&$z? 45: 46 : 0xFFFFFFFFFFFF0000&$z? 47: 48 : 0xFFFFFFFFFFFFFF80&$z? 0xFFFFFFFFFFFFF800&$z? 0xFFFFFFFFFFFFE000&$z? 0xFFFFFFFFFFFFC000&$z? 49: 50 : 0xFFFFFFFFFFFFF000&$z? 51: 52 : 0xFFFFFFFFFFFFFE00&$z? 0xFFFFFFFFFFFFFC00&$z? 53: 54 : 0xFFFFFFFFFFFFFF00&$z? 55: 56 : 0xFFFFFFFFFFFFFFF8&$z? 0xFFFFFFFFFFFFFFE0&$z? 0xFFFFFFFFFFFFFFC0&$z? 57: 58 : 0xFFFFFFFFFFFFFFF0&$z? 59: 60 : 0xFFFFFFFFFFFFFFFE&$z? 0xFFFFFFFFFFFFFFFC&$z? 61: 62 : 0xFFFFFFFFFFFFFFFF&$z? 63: 64; }


32-bit version:
my $m=0xFFFF8000&$z? 0xFF800000&$z? 0xF8000000&$z? 0xE0000000&$z? 0xC0000000&$z? 0x80000000&$z? 0 : 1: 2 : 0xF0000000&$z? 3: 4 : 0xFE000000&$z? 0xFC000000&$z? 5: 6 : 0xFF000000&$z? 7: 8 : 0xFFF80000&$z? 0xFFE00000&$z? 0xFFC00000&$z? 9: 10 : 0xFFF00000&$z? 11: 12 : 0xFFFE0000&$z? 0xFFFC0000&$z? 13: 14 : 0xFFFF0000&$z? 15: 16 : 0xFFFFFF80&$z? 0xFFFFF800&$z? 0xFFFFE000&$z? 0xFFFFC000&$z? 17: 18 : 0xFFFFF000&$z? 19: 20 : 0xFFFFFE00&$z? 0xFFFFFC00&$z? 21: 22 : 0xFFFFFF00&$z? 23: 24 : 0xFFFFFFF8&$z? 0xFFFFFFE0&$z? 0xFFFFFFC0&$z? 25: 26 : 0xFFFFFFF0&$z? 27: 28 : 0xFFFFFFFE&$z? 0xFFFFFFFC&$z? 29: 30 : 0xFFFFFFFF&$z? 31: 32;


There really isn't any good reason to use this, as bliako's GCC::Builtins::clz is about twice as fast (and about 64 times simpler) - unless, of course, you're modifying the constants to return some other pre-computed result for which there is no builtin alternative, to eliminate a few instructions. Personally, I'm going to see if I can write the whole encompassing piece of work in Perl+Assembly, as suggested. Straight down the rabbit hole at terminal velocity.

Replies are listed 'Best First'.
Re: Most Significant Set Bit
by talexb (Chancellor) on Mar 15, 2024 at 13:46 UTC

    From a strictly Math point of view, that sounds like you want the log base 2 (log2) of the value. So, the log2 of 64, for example, is 6, which means the 6th bit (starting at bit 0) is the MSB. 64, is, of course, 1_000_000 in binary. And you can get that by doing log($value)/log(2):

    DB<4> x ( log 64 ) / ( log 2 ) 0 6 DB<5>

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Most Significant Set Bit (Bit Twiddling References)
by eyepopslikeamosquito (Archbishop) on Mar 15, 2024 at 22:59 UTC
      Nit: while I'm the most recent releaser of Inline::C, it's not really "by" me, but by Ingy (with quite a few contributors).

        Thanks. I've updated the list of References.

        👁️🍾👍🦟
Re: Most Significant Set Bit
by hv (Prior) on Mar 15, 2024 at 17:09 UTC

    I think your example would be a lot more readable if you expressed the constants in binary or hex.

    If each result is approximately equally likely, I think it would be more efficient to do a binary chop: a 64-bit input should take no more than 6 tests. For accuracy you should probably write some code to generate this:

    my $m = $z > 0x00000000ffffffff ? $z > 0x0000ffffffffffff ? $z > 0x00ffffffffffffff ? $z > 0x0fffffffffffffff ? $z > 0x3fffffffffffffff ? $z > 0x7fffffffffffffff ? 0x8000000000000000 : 0x40000 +00000000000 : $z > 0x1fffffffffffffff ? 0x2000000000000000 : 0x10000 +00000000000 ... etc.

    I don't know much about Bloom filters, but it's possible you could construct one to do it faster. Another possibility would be some XS (or Inline C) - some processors have a single instruction to return this, and C compilers often provide functions that use such instructions where available, and fall back to an alternative implementation otherwise. In this case I think you could do it using __builtin_clz() if using gcc.

      I took your advice and wrote some code to generate that. The exercise was not futile. I had mistakenly thought I couldn't set the sign bit using hex notation, but now I know. I must have misinterpreted the warning about 64-bit hex being non-portable.
Re: Most Significant Set Bit
by hippo (Archbishop) on Mar 15, 2024 at 17:03 UTC
    though it looks like puke, it's a single statement. Is there a better performing solution?

    I doubt there will be a massively better performing solution but unless you are doing this literally billions of times I'd caution against this sort of code because it is such a maintenance nightmare.

    Here's a nice, clean bitwise shift version. Likely much slower but so much more maintainable.

    my $z = 256; # input my $m = 0; $m++ while $z >> $m; say $m;

    Update: See also Bit::Util::bu_first by salva.


    🦛

Re: Most Significant Set Bit
by ikegami (Patriarch) on Mar 16, 2024 at 03:55 UTC

    Many CPUs have ways of counting leading zeroes. In gcc, these can be access using __builtin_clz, __builtin_clzl and __builtin_clzll.

    So you could use Inline::C to run C code that uses that.

    Or you could use XS or FFI::Platypus to access a C library that provides a function that uses that.

Re: Most Significant Set Bit
by davido (Cardinal) on Mar 15, 2024 at 21:02 UTC

    A binary search should be the 3rd alternative here. First alternative, if the problem space is only 64 bits, is a linear search through 64 bits. Second alternative should probably be a mathematical solution. And the third would be a binary search. It's really hard to make the complexity of a binary search beat the speed of a linear search for a small search set.

    It's true that a binary search in a 64 bit unsigned integer range is going to take, at worst, 64 comparisons. And on average, a lot less than 64 comparisons, in a randomized data set of 64 bit numbers. This is O(log(n)). But a linear search through a 64-bit vector to find the most significant bit for an integer, is also O(log(n)); you have at most 64 comparisons with a binary search through 2**64 numbers, or you have at most 64 comparisons if you do a linear search through the bits of a number that fits within 64 bits. And a linear search will be very fast for such a small problem space.

    But a mathematical solution to the problem is that the most significant bit of any unsigned integer will be found at index int(log2(n)). So if n is 1, int(log2(n)) is 0 (the zero-index bit; the right-most bit). The int(log2(32767)) is 15. You cannot represent 32767 in fewer than 16 bits. And the most significant bit will be the left-most bit, or the bit at index (offset) 15.

    Therefore, a solution that requires NO iteration at all could be:

    sub most_significant_ix { my $n = shift; return if $_[0] == 0; return int(log($_[0])/log(2)); }

    Unfortunately the log function isn't inexpensive, so the linear search through 64 bits will probably still win, though on paper this solution is O(1), whereas the linear search through bits of an integer, and the binary search through the integer range, will both be O(log(n)) time complexity.

    A linear search through the bits solution would look like this:

    sub most_significant_ix { my $n = shift; return undef if $n == 0; my $bits = 64; while (--$bits >= 0) { return $bits if (2**$bits) & $n; } }

    Dave

      Unfortunately the log function isn't inexpensive

      True, but you can halve the expense for large numbers of evaluations by storing log(2), which is just a constant.


      🦛

        Strangely using a stored $log2 = log(2) in the above test did not improve the log function speed. It was actually slightly slower.

        EDIT: similar also with use constant log2 => log(2)

        EDIT2: The interpreter must compile such constant expressions into a constant before running.

      Some inaccuracies:
      It's true that a binary search in a 64 bit unsigned integer range is going to take, at worst, 64 comparisons. ... This is O(log(n))

      The initial proposal is a binary search on the bits which (as stated in the original post) takes at most 6 comparisons, not 64. It would be O(log(log(n)))

      But a linear search through a 64-bit vector to find the most significant bit for an integer, is also O(log(n)); .... And a linear search will be very fast for such a small problem space.

      Yes, linear on the number of bits, which is a loop of 64 comparisons, vs. the 6 proposed by OP.
      So, slower.

      Therefore, a solution that requires NO iteration at all could be ... though on paper this solution is O(1)

      Hiding a loop inside a function doesn't make it not-a-loop. Since the conversation is about bits, you can't just assume it as a constant like when you're assuming a 64-bit hardware op. If this were an arbitrary precision number like Math::BigInt, 'log' is definitely not a constant operation.

        Good point on the calculation of log not being O(1).

        How do you do a binary search on '0111110100111001011101010000111101000100111011000000000000000011'? I'm not quite sure what is meant by doing a binary search on the bits. What does the comparator look like? When I suggested that the binary search must be on the integer range, it was because I couldn't envision how a binary search would be applied to efficiently discover the first non-zero bit in a bit field directly. I could see it working fairly well on an integer range, though.


        Dave

      I timed those two functions along with the following:
      sub by_string { my $n = shift; return length(sprintf "%b", $n) - 1; }
      I tested with high bits using int rand(2**64), medium bits int rand(2**32) and lower bits int rand(2**16) for 1e6 iterations.
      log linear sprintf 0.2917 0.4427 0.3370 avg_sig=62 0.2786 2.8430 0.3056 avg_sig=30 0.2826 4.1060 0.2986 avg_sig=14
Re: Most Significant Set Bit
by jwkrahn (Abbot) on Mar 15, 2024 at 22:09 UTC

    In assembler: store the number in a 64-bit register, shift/rotate left until the overflow bit is true and the loop count is the correct bit number.

    Probably can't get much faster than that.

    Naked blocks are fun! -- Randal L. Schwartz, Perl hacker
Re: Most Significant Set Bit
by coldr3ality (Acolyte) on Mar 18, 2024 at 13:52 UTC
    Thank you all for your time and insight on this interesting nugget. My approach is indeed a 6-step binary chop as hv suggested, and after taking in and considering everything you guys have said, it seems clear that while this is the most efficient logical concept, I am hard pressed to do it justice in pure Perl. A proper C implementation would bring very significant gains in speed and could even take advantage of possible hardware support ikegami spoke of.

    If gcc has a builtin MSSB function, I have a mind to suggest that a future version of Perl could rock one. When we justifiably resort to bit twiddling, we are either trying to save memory, or optimize tight loops which iterate billions of times. In these cases, the best I can do in Perl (or might do in XS) would be glaringly inferior to a builtin function in both performance and maintainability.

    I would be curious about the recommended path to go about delivering such a thing to the community, not to be an entitled freeloader or to lack initiative. I have to admit that if there were ever any significant interest in this function among Perl users I'd see evidence of that as a module that provides it. For whatever reason, some ideas just aren't very popular, and I accept that.

    I understand there are many Perl functions which simply wrap C functions, and this seems almost like that. I would submit that while it may not be the most widely used feature, it may be a great addition, if it's no more difficult to implement.
Re: Most Significant Set Bit (Perl+Assembly)
by bliako (Abbot) on Mar 21, 2024 at 12:07 UTC

    Here is something to get you started with using assembly with Perl. This example is using Inline::C which makes things clearer to post, but I assume it can also be done with XS (untested). It relies on GCC's C compiler's inline assembly facility. The keyword asm for inlining assembly with GCC C and its syntax may differ for other compilers.

    use Inline C; use strict; use warnings; # Assembly code via Inline::C to return the # 1. number of leading zeros of the input integer # 2. a number with only bit set where the MSSB is located # # by bliako # for https://perlmonks.org/?node_id=11158279 # 21/03/2024 my $z = 17; my $res = mssb($z); print "Leading zeros for $z : ".$res->[0]."\n"; print "MSSB for $z : ".sprintf("%032b\n", $res->[1])."\n"; # result: # Leading zeros for 17 : 27 # MSSB for 17 : 00000000000000000000000000010000 __END__ __C__ #include <stdio.h> AV * mssb(int input){ int num_leading_zeros; int mssb; asm volatile( /* note: lzcnt inp, out mov src, dst add what, dst # set bit of value in dst at zero-based bitposition: btsl bitposition, dst (it modifies dst) */ "lzcnt %[input], %[num_leading_zeros] \n\t\ mov $32, %%eax \n\t\ sub %[num_leading_zeros], %%eax \n\t\ sub $1, %%eax \n\t\ xor %[mssb], %[mssb] \n\t\ bts %%eax, %[mssb] \n\t\ " /* outputs */ : [num_leading_zeros] "=r" (num_leading_zeros) , [mssb] "=r" (mssb) /* inputs */ : [input] "mr" (input) /* clobbers: we are messing with these registers */ : "eax" ); // return an arrayref of the two outputs AV* ret = newAV(); sv_2mortal((SV*)ret); av_push(ret, newSViv(num_leading_zeros)); av_push(ret, newSViv(mssb)); return ret; }

    Of course you have the easy way of using the module I just published GCC:Builtins which provides GCC's builtin clz() (count leading zeros) which then you need to slightly adjust, as you posted: Re^2: Most Significant Set Bit, or just do the whole operation with assembly as above. For both methods you need a gcc compiler. But one can easily modify the code for other compilers.

    bw, bliako

      Thank you for your help, I'm looking forward to working with this!
Re: Most Significant Set Bit
by bliako (Abbot) on Mar 19, 2024 at 13:41 UTC

    I hope I have beaten syphilis to publishing GCC::Builtins

    use GCC::Builtins qw/clz/; # or qw/:all/; $msb1 = clz(10); # 28

    Disclaimer: Builtins are implemented by GCC, I am merely interfacing them to Perl with XS, mechanistically. All questions to GCC please.

    bw, bliako

Re: Most Significant Set Bit
by bliako (Abbot) on Mar 20, 2024 at 20:44 UTC

    Perhaps I misunderstood what clz() does, but does the code you posted really work? It gives me 8 for $z=17. Shouldn't it be 27 for 32bit and 59 for 64bit? 17=10001 (with 59 leading zeros and MSB is at the zeros-side, left)

      It actually returns an integer with one bit set: the bit to the right of the MSSB. I overlooked this modification when I posted the code. Another modification is that it will not return less than 2. I may have to edit in an unmodified version to fix the original post. The modification is part of the aforementioned binary search variant I'm experimenting with, and I meant to leave that out.

      Given the leftmost / most significant bit is #0, the original code is equivalent to:
      my $z=17; my $m= $z<4? 2: 1<<( 63-( clz($z) +1) ); # 64-bit system integer my $m= $z<4? 2: 1<<( 31-( clz($z) +1) ); # 32-bit system integer




      EDIT 2024/3/27
      I know the original point is moot, but there is always the possibility that a future project will benefit from this idea somehow, and writing these binary chop statements by hand is precarious. The below code generator makes short work of it. Any power-of-2 from 2 to 64 will work as long as your system supports integers that size. 128 and 256 should work as well. Higher powers-of-2 will need a larger bitvector than the one I assigned to $BinFlow.



      EDIT 2024/3/30: more comprehensible variable names and comments
      use strict; use warnings; #### BINARY CHOP GENERATOR #### This script generates a single Perl statement #### in the form of a tree of inline conditionals #### to find the count of leading zeros in an integer $z #### with bit width P2 in log(P2)/log(2) steps. #### #### Disclaimer: for a better solution, use GCC::Builtins qw(clz). #configuration use constant Tab => ' 'x4; # tab width use constant P2 => 64; # bit width # (2, 4, 8, 16, 32, 64, 128, 256) use constant nSteps => (log(P2)/log(2) ); use constant MaxInt => (1<< P2)-1 ; use constant MASK_0 => 1<<(P2 -1); use constant CASE_0_LS => P2 -1; my $BinFlow= pack('C9', (0)x9); # bitvector of iterators carries # the binary composition forward # as the loop scales the tree, # printing it line by line. my $B=1; my $ls_=0; # supporting conditional's running left-shift my $ls=$ls_+(P2>>1); # "this" conditional's running left-shift print('my $m=',sprintf('0x%X', MaxInt&( -(1<<($ls-1)) )),'&$z? '); while($B>0){ my $if_else=vec($BinFlow, $B, 2)++; #advance conditional $B if($if_else==0){ #BEGIN NEW CONDITIONAL #(the "if" part) if($B<nSteps){ #not the final step yet? $ls_=vec( $BinFlow,++$B, 8)=$ls; #set left-shift as of $B $ls+= P2 >>$B; print("\r\n", Tab x$B); printf('0x%X&$z? ', MaxInt&( -(1<<($ls-1)) ) ); }else{ # To delineate all 65 unique conditional flow paths, # the 0th and 1st paths require an additional step. # These are treated as the most unlikely cases. printf("\r\n".(Tab x($B+1)). '0x%X&$z? 0'. "\r\n".( Tab x$B). ':'. (' 'x(8+((log(MASK_0)/log(16))>>0))), MASK_0 ) if $ls==CASE_0_LS; print(1+P2-(P2>>$B)-$ls, ': ', 1+P2-$ls, "\n"); vec( $BinFlow, $B, 2)=0; #reset step $B. $ls=$ls_; #revert left-shift and $ls_=vec( $BinFlow,--$B, 8); #fall back to previous step. } }elsif($if_else==1){ #RESUME CONDITIONAL #(the "else" part) print( Tab x$B); $ls_=$ls; ++$B; #climb back up a step $ls-= P2 >>$B; #negate shift component B printf(': 0x%X&$z? ', MaxInt&( -(1<<($ls-1)) ) ); }else{ #END CONDITIONAL vec( $BinFlow, $B, 2)=0; #reset step $B. $ls=$ls_; #revert left-shift and $ls_=vec( $BinFlow,--$B, 8); #fall back to previous step. } } print(";\r\n");