in reply to Re: Run length encode a bit vector
in thread Run length encode a bit vector

-- It will depend on whether you have longish sequences of contiguous ones or zeros.

The one set (of 25 sets) of indexes that I've analyzed, consists of 88 x 31MB vectors. They vary between 86% and 98% sparse (by zero bytes rather than bits).

The largest 0 runs range between 8 and 12 million bits. The largest 1 run is 67 bits. By packing the run counts as 0/1 pairs into 32-bit words, 24-bits for the 0 runs and 8-bits for the one run, I can reduce the size by more 2/3rds and am still able to perform boolean operations with decompressing first.

For the underlying principles see http://crd.lbl.gov/~kewu/ps/LBNL-49626.pdf.