in reply to Re^2: What is the best way to store and look-up a "similarity vector"?
in thread What is the best way to store and look-up a "similarity vector" (correlating, similar, high-dimensional vectors)?
I might receive more and more properties I haven't planned into my schema/bitmap/bitvector layout.
Use a two-step schema.
Using your example above, 'color:grey', 'material:wood', & 'surface:rough' are your attribute/value pairs (A/V pairs).
The auto-incremented 'id' value becomes that A/V pair's bit position in the data records.
As the number of A/V pairs increase, so do the length of (new) records, but the bit positions of existing A/V pairs retain their original meanings, so the schema does not need to change.
To select records, you build up a bit mask with bits set for the sought A/V pairs, and then AND it with each record. For similarity, count the resultant bits.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: What is the best way to store and look-up a "similarity vector"?
by isync (Hermit) on Nov 14, 2013 at 18:54 UTC | |
by BrowserUk (Patriarch) on Nov 14, 2013 at 19:50 UTC | |
by isync (Hermit) on Nov 14, 2013 at 20:48 UTC |