in reply to Bitwise operators on binary objects
Newer processors have a popcnt ("population count") instruction, which computes the number of 1-bits (for up to 64 bits) with a single fast CPU instruction. In other words, if you want real speed at the cost of portability, you could hack a little assembly1,2.
A quick test shows that this is roughly 10 times as fast as the above mentioned unpack "%32b*".
#!/usr/bin/perl -wl use strict; use Inline C => Config => LDDLFLAGS => '-shared ../../../popcount.o'; use Inline "C"; use Benchmark "cmpthese"; my $s = pack "C*", map { int rand 256 } 1 .. 2**18; # 256k #print unpack q/%32b*/, $s; #print count_bits($s); cmpthese (-2, { unpack => sub { my $n = unpack q/%32b*/, $s; }, popcnt => sub { my $n = count_bits($s); }, }); __END__ __C__ extern unsigned long popcount(char *, int64_t); // the asm routine unsigned long count_bits (SV* bitstr) { // XS wrapper STRLEN l; char *s = SvPV(bitstr, l); return popcount(s, l); }
Rate unpack popcnt unpack 1252/s -- -91% popcnt 13581/s 984% --
Tested with
$ cat /proc/cpuinfo | grep name | head -1 model name : AMD Phenom(tm) 9600 Quad-Core Processor
___
1 The assembly code (nasm syntax; 64 bit; gcc register subroutine calling convention — compile with nasm -f elf64 popcount.asm ):
bits 64 global popcount section .text ; prototype: unsigned long popcount(char*, int64_t); popcount: push rbp mov rbp, rsp mov rcx, rsi shr rcx, 3 mov rsi, rdi xor rax, rax .cnt popcnt rbx, [rsi] add rax, rbx add rsi, 8 loop .cnt mov rsp, rbp pop rbp ret
(Caveat: as this is just proof of concept code, the string length is assumed to be a multiple of 8 byte/64 bit, so you might need to zero-pad it)
2 In theory, there is also the gcc __builtin_popcount function, but in practice it turns out to be less portable than a piece of assembly code.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Bitwise operators on binary objects
by PeterCordes (Initiate) on Sep 22, 2017 at 09:31 UTC | |
Re^2: Bitwise operators on binary objects
by albert (Monk) on Feb 01, 2011 at 11:35 UTC |