in reply to Locating a specified number of contiguous 0 bits within a large bitstring efficiently.

some observations, mainly about the N>=8 case and if you're going to do this with regexps:
  1. regexps are designed for searching forward; while there are backtracking/"look-behind" operators, they're rather ad hoc and sometimes a bad idea to use,

  2. the byte-classes you need for for "ends with a sequence of exactly n zero bits", that, because of (1), you'll want to use for anchoring your regexps, are much easier to construct if your bytes are taken to be little-endian (i.e., the high order bits of byte k are taken to be adjacent to the low order bits of byte k+1).
    E.g., the range for "ends with a sequence of exactly 1 zero bit" is 0x40-0x7f, "...2 zero bits" is 0x20-0x3f and so on.
    And while this messes up the corresponding expressions/classes for "begins with at least n zero bits", the "at least" vs. "exactly" makes a difference in that you now only have to check the zeroness of the first (N - n) mod 8 bits, so I think things are still easier in the little-endian world.
Putting this all together we need a |-join of 8 regexps (n = 0..7 and m = N - n mod 8) of the form
[2^(7-n) .. (2^(7-n+1)-1)] \0{N/8} (?: ($ch mod 2^m == 0) | "\0+" )
modulo this not being quite correct perl-regexp code but you know how to fix that.
  • Comment on Re: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.
by wrog (Friar) on Jun 07, 2013 at 01:43 UTC
    Oh wait, this is simpler than I thought. We don't even need that last \0+

    Fleshing this out a bit more:

    # $pre[n] matches any character that ends with exactly n zero bits @pre = map { qr{[@{[chr(1<<(7-$_))]}-@{[chr((1<<(8-$_))-1)]}]} } 0 .. +7; $pre[0] = qr(^|$pre[0]); # $suf[m] matches if we have at least m zero bits following my @suf = ('', #done map { my $m=$_; qr{[@{[join '', map { chr($_<<$m) } 0..((1<<(8-$m))-1)]}]} } 1..7 ); sub zero_bit_regexp { my $N = shift; my $p = join '|', map { qr{$pre[$_]\0{@{[($N-$_)>>3]}}$suf[($N-$_)&0 +x7]} } 0..7; return qr{$p}; }
      except for the small matter that regexp quoting issues are turning out to be deadly, so better go with this instead
      # $pre[n] matches any character that ends with exactly n zero bits my @pre = map { sprintf "[\\x%02x-\\x%02x]", 1<<(7-$_), (1<<(8-$_))-1 +} 0 .. 7; $pre[0] = "(?:^|$pre[0])"; # $suf[m] matches any character that begins with at least m zero bits my @suf = ('', #m=0 -> match anything map { my $m=$_; '[' . join('', map { sprintf "\\x%02x", $_<<$m } 0..((1<<(8-$m))-1)) . ']' } 1..7); sub zero_bit_regexp { my $N = shift; my $p = join '|', map { "$pre[$_]\\0{@{[($N-$_)>>3]}}$suf[($N-$_)&0x +7]" } 0..7; return qr{$p}; }

        Thank you for all the work you've put into this.

        But whilst this finds a match (if it exists), I'm having the same problem with this as I am with my attempts to implement Dave_the_M's suggestion, that of converting the match to a bit position at which it was found?

        $-[0] tells me the byte position, but then I need to work the byte at than position back to the pattern than matched and then the bit offset within it. Thoughts?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.