in reply to Re^9: [OT] The interesting problem of comparing (long) bit-strings.
in thread [OT] The interesting problem of comparing bit-strings.
How is your data?
There's no way to say as it it intended for general use. Some bitsets may be sparse; some very dense.
The current use-case is characterised by substantial chunks of zero data interspersed with longish runs of fairly dense stuff.
B-M can potentially be several orders of magnitude faster than the brute-force approach
You keep claiming that, but for all the reasons I've outlined elsewhere, I do not believe it.
I'd be happy to be proved wrong, because in the end, I just want the fastest search I can code, but I can't even see how you would adapt B-M to bit-string search.
I'm doing my comparisons in 64-bit chunks; but building delta tables with 64-bit indices is obviously not on.
So, you look to doing byte sized compares in order to keep the table sizes reasonable.
BUT:
Not only does it require 8 cmp instructions instead of 1, it also requires 8 counter increments and 8 jumps.
Even if the compiler unwound the loop -- which it doesn't -- or I coded it in assembler, which I won't, it would still take substantially more than 8 times longer because loading a 64-bit register with 8-bit units, means the microcode has to shuffle 7 of the 8-bytes into the low-8 bits of the register. And it has to do that for both comparands.
So, the 8 x 8-bit compares versus 1 64-bit is more than 8 times slower.
But don't forget that for each n-bits, you need to do n comparisons with one of the comparands shift 1-bit each time.
So now the delta between 1 x 64-bit comparison and 8 x (unaligned) 8-bit comparisons, becomes 64 x 64-bit comparison versus 64 x 8-bit comparisons.
And that's not to mention that the 8-bit values from which bits need to be shifted in will also need to be shuffled by the microcode, adding further overheads.
For a modest-size 8192-bit needle, you're looking at 16*4*1024 = 64k of table space, that needs to be randomly accessed and thus wiping out my 32k L1 cache in the process.
I don't know for sure, because I haven't tried it, because I don't believe it would be beneficial. I'd be happy to be proved wrong, but I don't think I will be.
|
|---|