in reply to Re^3: What is the best way to store and look-up a "similarity vector"?
in thread What is the best way to store and look-up a "similarity vector" (correlating, similar, high-dimensional vectors)?
Properties: id|A/V-pairs ----------------- 0|colour:red 1|colour:green 2|material:metal 3|colour:blue 4|material:wood 5|surface:roughAnd then our thing db:
things: bitmap|thing 012345|name ----------------- 101011|red-metal-wood-rough-Thing 001101|metal-blue-rough-Thing 01 |green-Thing
Questions:sub getBits { # lookup: colour:red -> is id/bitposition:0 # lookup: material:wood -> is id/bitposition:4 } my $bits = getBits('red-wood'); # $bits is 10001 my $nBits = unpack '%32b*', $bits; # http://docstore.mik.ua/orelly/per +l/prog/ch03_182.htm : "efficiently counts the number of set bits in a + bit vector" for my $straw ( @haystack ){ # loop over all records and compare my $similarity = unpack '%32b*', $straw & needle; # compute a delt +a print "Percentage similarity %f\n", $similarity / $nBits * 100; # +delta in relation to nbits benchmark ("distance") } # then, sort by similarity
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: What is the best way to store and look-up a "similarity vector"?
by BrowserUk (Patriarch) on Nov 14, 2013 at 19:50 UTC | |
by isync (Hermit) on Nov 14, 2013 at 20:48 UTC |