Given a bitstring $vec upto 64MB long, how to quickly locate a run of (at least) N contiguous, unset bits regardless of alignment?

N might be any number from 1 to 100s

My current thinking is that for N > 8, B = int( N /8 ), b = N % 8; Search for B contiguous 0 bytes, then check the preceding and following bytes for at least b adjacent 0 bits.

For N < 8; for each N, only a subset of the 256 bit patterns in a byte have N unset contiguous bits. Ie.

With a lookup table mapping the number of bits required to the bit patterns that can provide them, I have a way to locate these smaller runs.

Except that:

  1. Any one of the run lengths > 2 can be provided by many combinations of 2 adjacent bytes.
  2. The whole process is becoming both decision rich and search heavy.

So then I thought about having 8 more bit strings, where each bit represents a byte in the primary bitstring, one for each of the 8 sets above, where a set bit indicates that this byte in the primary bitstring can provide the requisite number of bits.

And when those bits (in the primary bitstring) are utilised, the one bit is toggled (off) in the secondary bitstring where is was found; and one bit is set in the secondary bitstring that represents the maximum number of unset bits it can now provide.

This reduces the searching to picking the appropriate secondary bitstring and looking for a non-zero byte. (It doesn't yet deal with the adjacent bytes with adjacent, contiguous bits problem.).

It also helps (??) with the large N problem, in that instead of needing to search the primary bitstring for N/8 contiguous 0 bytes; I can search the '7-bits' secondary bitstring for N/8 contiguous bits, potentially cutting the search time to 1/8th.

But of course, the same alignment problem that existed with the primary bit string for which the secondary bitstrings are used to solve, now manifests itself again.

  1. Do I implement a tertiary set of bitstrings (See A.)

Thoughts, comments, better solutions?

(For background and because someone is going to ask though it doesn't affect the problem or possible solutions. The primary bitstring represents freespace in an allocator. (Could be disk or memory.))


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Locating a specified number of contiguous 0 bits within a large bitstring efficiently. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.