The previous sub-thread suggests, building bitmaps as record-"fingerprints" (binary patterns, based on a two-step/abstracting schema) and on look-up comparing them against a query-"fingerprint" might be is an efficient solution, as it relies on binary operators and binary (core::) functions. A low level one, though. (Implementing this might be reinventing the wheel.. search CPAN first.)
Stemming from that, looking at tools available, from what I know, inverted-index engines, like Lucy, Plucene or Sphinx, do exactly that. No? They look-up postings/terms, assign an id and store these ids in with the help of bitmaps. On lookup, they do the aforementioned comparisons and apply boolean operations (intersection).
So, would it be a sane setup to use a search-engine backend as off-the-shelf bitvektor-db? Only problem I see is that applying too many filters would result in an empty set. I don't know if any of the mentioned backends allows for something like "cancel this property/filter and you'll get this more results" sort of similarity option...
|
|---|