in reply to Re^2: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.
in thread Locating a specified number of contiguous 0 bits within a large bitstring efficiently.

except for the small matter that regexp quoting issues are turning out to be deadly, so better go with this instead
# $pre[n] matches any character that ends with exactly n zero bits my @pre = map { sprintf "[\\x%02x-\\x%02x]", 1<<(7-$_), (1<<(8-$_))-1 +} 0 .. 7; $pre[0] = "(?:^|$pre[0])"; # $suf[m] matches any character that begins with at least m zero bits my @suf = ('', #m=0 -> match anything map { my $m=$_; '[' . join('', map { sprintf "\\x%02x", $_<<$m } 0..((1<<(8-$m))-1)) . ']' } 1..7); sub zero_bit_regexp { my $N = shift; my $p = join '|', map { "$pre[$_]\\0{@{[($N-$_)>>3]}}$suf[($N-$_)&0x +7]" } 0..7; return qr{$p}; }
  • Comment on Re^3: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.
  • Download Code

Replies are listed 'Best First'.
Re^4: Locating a specified number of contiguous 0 bits within a large bitstring efficiently.
by BrowserUk (Patriarch) on Jun 07, 2013 at 03:19 UTC

    Thank you for all the work you've put into this.

    But whilst this finds a match (if it exists), I'm having the same problem with this as I am with my attempts to implement Dave_the_M's suggestion, that of converting the match to a bit position at which it was found?

    $-[0] tells me the byte position, but then I need to work the byte at than position back to the pattern than matched and then the bit offset within it. Thoughts?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Put in match groups, i.e., replace
      my $p = join '|', map { "$pre[$_]\\0{@{[($N-$_)>>3]}}$suf[($N-$_)&0x7] +" } 0..7;
      with
      my $p = join '|', map { "$pre[$_](\\0{@{[($N-$_)>>3]}}$suf[($N-$_)&0x7 +])" } 0..7;
      By construction, only one will match and $#- will tell you which one. We place the leftparen so that the group always starts at the first byte that's beginning with a zero bit. Then you just have to subtract the number of zero bits at the end of the byte that $pre[$_] is matching, but that's just $#- - 1. So the answer is
      $-[$#-]*8 - $#- + 1
      Still needs a bit more work to handle the $N < 8 case.

        Unfortunately, the bitpatterns you generate -- in common with those used by grimey's code -- are not compatible with vec.

        By way of example, this generates a vector with a single length 1x0 run, and a single length 2x0 run; and single length 3x0 run; and so on:

        $vec = ''; $n=-1; vec( $vec, $n+=$_, 1 ) = 1 for 2 .. 17; print unpack 'b*', $vec;; 0100100010000100000100000010000000100000000100000000010000000000100000 +000000100000000000010000000000000100000000000000100000000000000010000 +0000000000001000... # 2 5 9 14 20 27 35 44 54 65 + 77 90 104 119 135 + 152

        Thus, a search for

        • 1 x 0 should be found at position 0;
        • 2 x 0 should be found at position 2;
        • 3@5; 4@9; 5@14; ...

        Running your code looking for these runs against that vector produces:

        wrog 1: 0 wrog 2: 14 wrog 3: 5 wrog 4: 14 wrog 5: 14 wrog 6: 20 wrog 7: 27 wrog 8: 35 wrog 9: 44 wrog 10: 54 wrog 11: 65 wrog 12: 77 wrog 13: 90 wrog 14: 90 wrog 15: 90 wrog 16: 90 wrog 17: 90

        I believe this is because little-endian bytes are bytes swapped and word swapped relative to big-endian.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.