comment on

As I understand the author merlin defines a match as follows:

 =13=    my $FUZZ = 5; # permitted average deviation in the vector ele
+ments
   
...   

=66=      BUCKET: for my $bucket (@buckets) {
=67=        my $error = 0;
=68=        INDEX: for my $index (0..$#vector) {
=69=          $error += abs($bucket->[0][$index] - $vector[$index]);
=70=          next BUCKET if $error > $FUZZ * @vector;
=71=        }
...
[download]

IMHO the above set of matches is a subset of all matches where

$pattern_sum += @pattern_vector;
$upper_bound  = $pattern_sum + $FUZZ * @pattern_vector;
$lower_bound  = $pattern_sum + $FUZZ * @pattern_vector;

BUCKET: for my $bucket (@buckets) {
   my $bucket_sum += @{$bucket->[0]};
   next BUCKET 
       if ($bucket_sum > $upper_bound || $bucket_sum < $lower_bound);
   # found, do something
}
[download]

Depending on the randomness the matches will be roughly doubled, i.e. with $FUZZ=5, and a vector with 48 elements each an 8-bit integer, the number of possible different sums of the vector values is 48*255+1=12_241. The original method gives 48*5=240 as maximal allowed sum of the absolute differences. Thus the set of all possible sums is reduced by a factor of 12_241/240=51. When we use an interval of +/- 5, then 48*5*2=480, and the reduction is only 25. This means 1_000 images found out of a total of 25_000 images.

But if we calculate the sum of the vector, we can store it as an integer field in the database and use SQL comparisons.

The query result could still be refined using the original method, or something better like e.g. cosine similarity, which should be fast enough for ~1_000 vectors.

In reply to Re: Comparing images to find similar images in a database by wollmers
in thread Comparing images to find similar images in a database by walkingthecow

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.