The following code and output attempts to demonstrate that it is possible to generate a single numerical "signature" or digest for your sets of points that will cause similar sets of points to sort close together. By generating these digests for each set in your database and then generating one for your new film as you receive it you can use a binary search/tree to locate the insertion point into the database, and so find those that have a similar signature. You then do a more thorough spacial comparison of a few either side to locate the best matches.

This is not an exact science as the following output shows.

2760.43800 => 8 33 21 55 51 48 6 20 52 63 2766.50111 => 59 55 4 88 74 63 20 33 28 34 3 93 9 19 10 22 42 44 3587.61500 => 13 69 53 24 61 90 56 28 61 82 17 85 12 59 14 55 3714.33857 => 27 32 54 19 46 20 38 93 61 19 9 53 25 1 4359.73000 => 91 84 9 40 13 98 88 60 17 83 4400.39667 => 13 42 79 34 5 47 25 20 78 35 59 75 89 42 0 26 48 36 4420.68800 => 43 83 41 83 16 69 99 49 22 60 4437.52375 => 33 44 1 71 35 71 97 26 39 78 31 3 32 65 87 61 4642.57000 => 70 98 20 25 79 19 18 49 48 78 50 77 40 53 4680.39600 => 55 90 88 6 9 1 56 45 26 56 4737.51125 => 98 33 3 22 75 40 64 18 91 67 8 65 29 68 11 96 4820.32000 => 46 57 45 2 64 46 46 1 40 54 4844.49444 => 20 43 25 36 33 84 63 11 15 2 70 69 47 77 67 91 96 32 4849.23333 => 26 21 25 46 46 14 79 11 85 39 30 9 4885.40571 => 37 18 90 23 38 13 97 63 17 26 41 56 22 85 4900.49444 => 67 66 7 45 77 19 4 84 74 53 0 65 37 12 91 65 84 36 5080.38000 => 51 38 31 65 62 4 37 35 73 48 5266.59167 => 84 26 93 43 11 80 16 50 66 80 46 76 5350.28000 => 67 0 57 24 46 50 44 38 5511.37333 => 72 27 30 52 37 35 7 76 23 45 71 25 75 17 86 29 95 30 5912.56625 => 53 23 72 95 27 29 94 59 18 22 89 33 57 97 63 95 5977.42111 => 82 24 88 16 49 9 63 19 84 77 25 90 48 11 69 45 30 88 6599.53250 => 92 44 27 30 94 43 51 96 6899.52714 => 45 32 65 80 58 22 95 44 51 36 93 61 76 94 7100.49500 => 76 20 41 22 93 64 74 92 7250.45167 => 70 26 38 71 78 36 84 58 84 5 81 75 7599.52250 => 60 73 51 45 95 63 98 28 7950.34500 => 92 32 91 28 80 76 55 2 8100.23000 => 94 40 90 13 57 36 83 3 8660.23400 => 98 42 79 0 72 26 95 11 89 38 ------ 2760.43800 => 8 33 21 55 51 48 6 20 52 63 2766.50111 => 59 55 4 88 74 63 20 33 28 34 3 93 9 19 10 22 42 44 3000.54000 => 59 55 4 88 74 63 20 33 28 34 3 93 10 22 42 44 3300.49750 => 8 33 21 55 51 48 52 63 3512.61375 => 15 76 53 28 52 90 49 19 69 79 13 85 16 59 14 55 3587.61500 => 15 76 53 28 52 90 49 19 69 79 13 85 16 59 14 55 3714.33857 => 21 32 62 19 51 20 30 92 69 9 13 53 25 8 3871.33286 => 21 32 62 19 51 20 30 92 69 9 13 53 25 8 4224.53625 => 25 41 1 69 40 77 97 26 38 78 26 3 32 74 79 61 4239.70000 => 91 77 9 32 13 98 88 63 11 80 4250.50167 => 20 25 79 19 18 49 48 78 50 77 40 53 4359.73000 => 91 77 9 32 13 98 88 63 11 80 4400.39667 => 13 42 79 34 5 47 25 20 78 35 59 75 89 42 0 26 48 36 4420.68800 => 43 83 41 83 16 69 99 49 22 60 4437.52375 => 25 41 1 69 40 77 97 26 38 78 26 3 32 74 79 61 4600.23333 => 26 21 25 46 37 14 79 11 85 39 24 9 4642.57000 => 70 98 20 25 79 19 18 49 48 78 50 77 40 53 4657.39286 => 37 18 90 14 31 13 91 63 14 26 41 56 22 85 4680.39600 => 55 90 88 6 9 1 56 45 26 56 4700.52875 => 98 32 -6 22 75 48 64 18 91 67 8 70 29 77 17 89 4737.51125 => 98 32 -6 22 75 48 64 18 91 67 8 70 29 77 17 89 4740.30600 => 46 50 41 2 64 46 46 1 40 54 4820.32000 => 46 50 41 2 64 46 46 1 40 54 4844.49444 => 20 43 25 36 33 84 63 11 15 2 70 69 47 77 67 91 96 32 4849.23333 => 26 21 25 46 37 14 79 11 85 39 24 9 4880.39200 => 41 38 31 65 62 -5 37 41 73 57 4885.40571 => 37 18 90 14 31 13 91 63 14 26 41 56 22 85 4887.38750 => 13 42 79 34 25 20 78 35 59 75 89 42 0 26 48 36 4900.37333 => 57 24 46 50 44 38 4900.49444 => 67 74 7 38 77 19 4 91 74 53 0 65 42 18 100 57 84 30 5055.49444 => 67 74 7 38 77 19 4 91 74 53 0 65 42 18 100 57 84 30 5080.38000 => 41 38 31 65 62 -5 37 41 73 57 5125.38375 => 72 27 30 52 37 35 7 76 23 45 71 25 75 17 95 30 5125.68750 => 43 83 41 83 99 49 22 60 5137.51125 => 20 43 33 84 63 11 15 2 70 69 47 77 67 91 96 32 5200.35500 => 55 90 88 6 9 1 56 45 5266.59167 => 84 26 93 43 11 80 16 50 66 80 46 76 5350.28000 => 67 0 57 24 46 50 44 38 5511.37333 => 72 27 30 52 37 35 7 76 23 45 71 25 75 17 86 29 95 30 5912.56625 => 53 23 72 95 27 29 94 59 18 22 89 33 57 97 63 95 5977.42111 => 82 24 88 16 49 9 63 19 84 77 25 90 48 11 69 45 30 88 6000.61000 => 84 26 93 43 11 80 66 80 46 76 6124.46000 => 82 24 88 16 49 9 63 19 84 77 25 90 69 45 30 88 6500.61571 => 53 23 72 95 27 29 94 59 89 33 57 97 63 95 6599.53250 => 92 44 27 30 94 43 51 96 6899.52714 => 45 32 74 80 58 22 95 50 55 36 93 58 74 94 7000.35333 => 76 20 41 22 93 64 7057.53143 => 45 32 74 80 58 22 95 50 55 36 93 58 74 94 7100.39000 => 92 44 27 30 94 43 7100.49500 => 76 20 41 22 93 64 74 92 7139.47000 => 70 26 38 71 84 58 84 5 81 75 7250.45167 => 70 26 38 71 78 36 84 58 84 5 81 75 7599.52250 => 52 73 58 37 95 54 103 28 7700.48000 => 52 73 58 37 95 54 103 28 7849.35000 => 88 39 91 29 80 70 55 2 7950.34500 => 88 39 91 29 80 70 55 2 8100.23000 => 94 40 90 13 57 36 83 3 8660.23400 => 98 47 88 0 72 16 103 11 89 38 8900.18667 => 94 40 90 13 83 3 9000.22400 => 98 47 88 0 72 16 103 11 89 38

The first set of data (above the ----- line) are the signatures and related datasets read from the DATA section of the code.

The second set of data includes all the first set again, plus the same datasets with modifications made, sorted by sugnature. In some I have deleted one or more pairs, other I've adjusted some values at random by +/-10.

The output show that the algorithm I'm using (the third I tried) does a fairly good job of grouping the modified dataset close or adjacent to the original it is derived from.

It's not perfect. With more tweaking and testing it should be possible to adjust it to do a better (more sensitive) job, but that requires a detailed knowledge (and sample datasets) to do properly.

The basic approach (see calcSig()) is :

  1. Produce normalised (0-1) value for each X & Y.
  2. Sum all the normalised Xs and divide by the number of pairs to produce a "normalised average X".
  3. Do the same for the Ys.
  4. Multiply the X average by a factor and int to produce an integer large enough to have enough significant digits to distribute the X signatures nicely.

    This should probably be increased slightly.

  5. Add the normalised average Y value to produce the floating point signature.

    These last two step are prime candidates for improvement as this causes the signature to favour the Xs. The idea is to find an algorithm that distributes the datasets over a wide hash space fairly evenly without wrapping. I have achieved that here yet.

Here's the code. It's not documented. If you are interested in pursuing the idea further and have questions, fell free to ask.

#! perl -slw use strict; use Data::Dumper; use List::Util qw[ reduce sum ]; use constant { MAXX => 100, MAXY => 100, }; sub calcSig { my $sum = reduce { $a->[ 0 ] += $b->[ 0 ] / MAXX; $a->[ 1 ] += $b->[ 1 ] / MAXY; $a; } [], @_; $sum->[ 0 ] /= @_; $sum->[ 1 ] /= @_; return int( $sum->[ 0 ] * MAXX * MAXY ) + $sum->[ 1 ] } my( @sigs, %sigs ); while( <DATA> ) { chomp; my @points; push @points, [ $1, $2 ] while m[\((\d+?),(\d+?)\)(?:,\s|$)]g; # print Dumper \@points; push @sigs, my $sig = calcSig @points; $sigs{ $sig } = \@points; } @sigs = sort{ $a <=> $b } @sigs; printf "%9.5f => %s\n", $_, join ' ', map{ @$_ } @{ $sigs{ $_ } } for +@sigs; print "\n ------ \n"; my @adjusted = map{ [ @$_ ] } values %sigs; for my $aref ( @adjusted ) { if( rand > .3 ) { ## delete a pair splice @$aref, rand( @$aref ), 1; } else { ## Adjust a few values; @$aref = map{ rand > .5 and $_->[ 0 ] += -10+int( rand 20 ); rand > .5 and $_->[ 1 ] += -10+int( rand 20 ); $_; } @$aref; } push @sigs, my $sig = calcSig @$aref; $sigs{ $sig } = $aref; } @sigs = sort{ $a <=> $b } @sigs; printf "%9.5f => %s\n", $_, join ' ', map{ @$_ } @{ $sigs{ $_ } } for +@sigs; __DATA__ (67,66), (7,45), (77,19), (4,84), (74,53), (0,65), (37,12), (91,65), ( +84,36) (92,44), (27,30), (94,43), (51,96) (46,57), (45,2), (64,46), (46,1), (40,54) (84,26), (93,43), (11,80), (16,50), (66,80), (46,76) (92,32), (91,28), (80,76), (55,2) (94,40), (90,13), (57,36), (83,3) (13,69), (53,24), (61,90), (56,28), (61,82), (17,85), (12,59), (14,55) (33,44), (1,71), (35,71), (97,26), (39,78), (31,3), (32,65), (87,61) (70,26), (38,71), (78,36), (84,58), (84,5), (81,75) (70,98), (20,25), (79,19), (18,49), (48,78), (50,77), (40,53) (27,32), (54,19), (46,20), (38,93), (61,19), (9,53), (25,1) (43,83), (41,83), (16,69), (99,49), (22,60) (98,42), (79,0), (72,26), (95,11), (89,38) (67,0), (57,24), (46,50), (44,38) (76,20), (41,22), (93,64), (74,92) (82,24), (88,16), (49,9), (63,19), (84,77), (25,90), (48,11), (69,45), + (30,88) (53,23), (72,95), (27,29), (94,59), (18,22), (89,33), (57,97), (63,95) (20,43), (25,36), (33,84), (63,11), (15,2), (70,69), (47,77), (67,91), + (96,32) (59,55), (4,88), (74,63), (20,33), (28,34), (3,93), (9,19), (10,22), ( +42,44) (8,33), (21,55), (51,48), (6,20), (52,63) (72,27), (30,52), (37,35), (7,76), (23,45), (71,25), (75,17), (86,29), + (95,30) (51,38), (31,65), (62,4), (37,35), (73,48) (98,33), (3,22), (75,40), (64,18), (91,67), (8,65), (29,68), (11,96) (45,32), (65,80), (58,22), (95,44), (51,36), (93,61), (76,94) (55,90), (88,6), (9,1), (56,45), (26,56) (26,21), (25,46), (46,14), (79,11), (85,39), (30,9) (37,18), (90,23), (38,13), (97,63), (17,26), (41,56), (22,85) (13,42), (79,34), (5,47), (25,20), (78,35), (59,75), (89,42), (0,26), +(48,36) (60,73), (51,45), (95,63), (98,28) (91,84), (9,40), (13,98), (88,60), (17,83)

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

In reply to Re: Interesting problem fitting points to a template (algorithm?) by BrowserUk
in thread Interesting problem fitting points to a template (algorithm?) by doowah2004

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.