But is that how a project like Panopticlick calculates "similarity" across bit-vectors?
I've no idea about pana-pano, that thing, but ostensibly, it can be as simple as counting the number of matching bits (properties):
my $needle = getBits(); my $nBits = unpack '%32b*', $needle; for my $straw ( @haystack ) { my $similarity = unpack '%32b*', $straw & needle; print "Percentage similarity %f\n", $similarity / $nBits * 100; }
For parts/product catalogue type applications, bit-mapped data records can be very effective and efficient, because their attributes tend to have a fixed (and limited) number of values. Ie. Half a dozen colors; half a dozen sizes; 2 or 3 finishes; etc. That means each choice can be represented by a bit position in a vector. Selection can be extremely fast.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: What is the best way to store and look-up a "similarity vector"?
by isync (Hermit) on Nov 14, 2013 at 17:38 UTC | |
by BrowserUk (Patriarch) on Nov 14, 2013 at 18:07 UTC | |
by isync (Hermit) on Nov 14, 2013 at 18:54 UTC | |
by BrowserUk (Patriarch) on Nov 14, 2013 at 19:50 UTC | |
by isync (Hermit) on Nov 14, 2013 at 20:48 UTC |