The following code and output attempts to demonstrate that it is possible to generate a single numerical "signature" or digest for your sets of points that will cause similar sets of points to sort close together. By generating these digests for each set in your database and then generating one for your new film as you receive it you can use a binary search/tree to locate the insertion point into the database, and so find those that have a similar signature. You then do a more thorough spacial comparison of a few either side to locate the best matches.
This is not an exact science as the following output shows.
2760.43800 => 8 33 21 55 51 48 6 20 52 63 2766.50111 => 59 55 4 88 74 63 20 33 28 34 3 93 9 19 10 22 42 44 3587.61500 => 13 69 53 24 61 90 56 28 61 82 17 85 12 59 14 55 3714.33857 => 27 32 54 19 46 20 38 93 61 19 9 53 25 1 4359.73000 => 91 84 9 40 13 98 88 60 17 83 4400.39667 => 13 42 79 34 5 47 25 20 78 35 59 75 89 42 0 26 48 36 4420.68800 => 43 83 41 83 16 69 99 49 22 60 4437.52375 => 33 44 1 71 35 71 97 26 39 78 31 3 32 65 87 61 4642.57000 => 70 98 20 25 79 19 18 49 48 78 50 77 40 53 4680.39600 => 55 90 88 6 9 1 56 45 26 56 4737.51125 => 98 33 3 22 75 40 64 18 91 67 8 65 29 68 11 96 4820.32000 => 46 57 45 2 64 46 46 1 40 54 4844.49444 => 20 43 25 36 33 84 63 11 15 2 70 69 47 77 67 91 96 32 4849.23333 => 26 21 25 46 46 14 79 11 85 39 30 9 4885.40571 => 37 18 90 23 38 13 97 63 17 26 41 56 22 85 4900.49444 => 67 66 7 45 77 19 4 84 74 53 0 65 37 12 91 65 84 36 5080.38000 => 51 38 31 65 62 4 37 35 73 48 5266.59167 => 84 26 93 43 11 80 16 50 66 80 46 76 5350.28000 => 67 0 57 24 46 50 44 38 5511.37333 => 72 27 30 52 37 35 7 76 23 45 71 25 75 17 86 29 95 30 5912.56625 => 53 23 72 95 27 29 94 59 18 22 89 33 57 97 63 95 5977.42111 => 82 24 88 16 49 9 63 19 84 77 25 90 48 11 69 45 30 88 6599.53250 => 92 44 27 30 94 43 51 96 6899.52714 => 45 32 65 80 58 22 95 44 51 36 93 61 76 94 7100.49500 => 76 20 41 22 93 64 74 92 7250.45167 => 70 26 38 71 78 36 84 58 84 5 81 75 7599.52250 => 60 73 51 45 95 63 98 28 7950.34500 => 92 32 91 28 80 76 55 2 8100.23000 => 94 40 90 13 57 36 83 3 8660.23400 => 98 42 79 0 72 26 95 11 89 38 ------ 2760.43800 => 8 33 21 55 51 48 6 20 52 63 2766.50111 => 59 55 4 88 74 63 20 33 28 34 3 93 9 19 10 22 42 44 3000.54000 => 59 55 4 88 74 63 20 33 28 34 3 93 10 22 42 44 3300.49750 => 8 33 21 55 51 48 52 63 3512.61375 => 15 76 53 28 52 90 49 19 69 79 13 85 16 59 14 55 3587.61500 => 15 76 53 28 52 90 49 19 69 79 13 85 16 59 14 55 3714.33857 => 21 32 62 19 51 20 30 92 69 9 13 53 25 8 3871.33286 => 21 32 62 19 51 20 30 92 69 9 13 53 25 8 4224.53625 => 25 41 1 69 40 77 97 26 38 78 26 3 32 74 79 61 4239.70000 => 91 77 9 32 13 98 88 63 11 80 4250.50167 => 20 25 79 19 18 49 48 78 50 77 40 53 4359.73000 => 91 77 9 32 13 98 88 63 11 80 4400.39667 => 13 42 79 34 5 47 25 20 78 35 59 75 89 42 0 26 48 36 4420.68800 => 43 83 41 83 16 69 99 49 22 60 4437.52375 => 25 41 1 69 40 77 97 26 38 78 26 3 32 74 79 61 4600.23333 => 26 21 25 46 37 14 79 11 85 39 24 9 4642.57000 => 70 98 20 25 79 19 18 49 48 78 50 77 40 53 4657.39286 => 37 18 90 14 31 13 91 63 14 26 41 56 22 85 4680.39600 => 55 90 88 6 9 1 56 45 26 56 4700.52875 => 98 32 -6 22 75 48 64 18 91 67 8 70 29 77 17 89 4737.51125 => 98 32 -6 22 75 48 64 18 91 67 8 70 29 77 17 89 4740.30600 => 46 50 41 2 64 46 46 1 40 54 4820.32000 => 46 50 41 2 64 46 46 1 40 54 4844.49444 => 20 43 25 36 33 84 63 11 15 2 70 69 47 77 67 91 96 32 4849.23333 => 26 21 25 46 37 14 79 11 85 39 24 9 4880.39200 => 41 38 31 65 62 -5 37 41 73 57 4885.40571 => 37 18 90 14 31 13 91 63 14 26 41 56 22 85 4887.38750 => 13 42 79 34 25 20 78 35 59 75 89 42 0 26 48 36 4900.37333 => 57 24 46 50 44 38 4900.49444 => 67 74 7 38 77 19 4 91 74 53 0 65 42 18 100 57 84 30 5055.49444 => 67 74 7 38 77 19 4 91 74 53 0 65 42 18 100 57 84 30 5080.38000 => 41 38 31 65 62 -5 37 41 73 57 5125.38375 => 72 27 30 52 37 35 7 76 23 45 71 25 75 17 95 30 5125.68750 => 43 83 41 83 99 49 22 60 5137.51125 => 20 43 33 84 63 11 15 2 70 69 47 77 67 91 96 32 5200.35500 => 55 90 88 6 9 1 56 45 5266.59167 => 84 26 93 43 11 80 16 50 66 80 46 76 5350.28000 => 67 0 57 24 46 50 44 38 5511.37333 => 72 27 30 52 37 35 7 76 23 45 71 25 75 17 86 29 95 30 5912.56625 => 53 23 72 95 27 29 94 59 18 22 89 33 57 97 63 95 5977.42111 => 82 24 88 16 49 9 63 19 84 77 25 90 48 11 69 45 30 88 6000.61000 => 84 26 93 43 11 80 66 80 46 76 6124.46000 => 82 24 88 16 49 9 63 19 84 77 25 90 69 45 30 88 6500.61571 => 53 23 72 95 27 29 94 59 89 33 57 97 63 95 6599.53250 => 92 44 27 30 94 43 51 96 6899.52714 => 45 32 74 80 58 22 95 50 55 36 93 58 74 94 7000.35333 => 76 20 41 22 93 64 7057.53143 => 45 32 74 80 58 22 95 50 55 36 93 58 74 94 7100.39000 => 92 44 27 30 94 43 7100.49500 => 76 20 41 22 93 64 74 92 7139.47000 => 70 26 38 71 84 58 84 5 81 75 7250.45167 => 70 26 38 71 78 36 84 58 84 5 81 75 7599.52250 => 52 73 58 37 95 54 103 28 7700.48000 => 52 73 58 37 95 54 103 28 7849.35000 => 88 39 91 29 80 70 55 2 7950.34500 => 88 39 91 29 80 70 55 2 8100.23000 => 94 40 90 13 57 36 83 3 8660.23400 => 98 47 88 0 72 16 103 11 89 38 8900.18667 => 94 40 90 13 83 3 9000.22400 => 98 47 88 0 72 16 103 11 89 38
The first set of data (above the ----- line) are the signatures and related datasets read from the DATA section of the code.
The second set of data includes all the first set again, plus the same datasets with modifications made, sorted by sugnature. In some I have deleted one or more pairs, other I've adjusted some values at random by +/-10.
The output show that the algorithm I'm using (the third I tried) does a fairly good job of grouping the modified dataset close or adjacent to the original it is derived from.
It's not perfect. With more tweaking and testing it should be possible to adjust it to do a better (more sensitive) job, but that requires a detailed knowledge (and sample datasets) to do properly.
The basic approach (see calcSig()) is :
This should probably be increased slightly.
These last two step are prime candidates for improvement as this causes the signature to favour the Xs. The idea is to find an algorithm that distributes the datasets over a wide hash space fairly evenly without wrapping. I have achieved that here yet.
Here's the code. It's not documented. If you are interested in pursuing the idea further and have questions, fell free to ask.
#! perl -slw use strict; use Data::Dumper; use List::Util qw[ reduce sum ]; use constant { MAXX => 100, MAXY => 100, }; sub calcSig { my $sum = reduce { $a->[ 0 ] += $b->[ 0 ] / MAXX; $a->[ 1 ] += $b->[ 1 ] / MAXY; $a; } [], @_; $sum->[ 0 ] /= @_; $sum->[ 1 ] /= @_; return int( $sum->[ 0 ] * MAXX * MAXY ) + $sum->[ 1 ] } my( @sigs, %sigs ); while( <DATA> ) { chomp; my @points; push @points, [ $1, $2 ] while m[\((\d+?),(\d+?)\)(?:,\s|$)]g; # print Dumper \@points; push @sigs, my $sig = calcSig @points; $sigs{ $sig } = \@points; } @sigs = sort{ $a <=> $b } @sigs; printf "%9.5f => %s\n", $_, join ' ', map{ @$_ } @{ $sigs{ $_ } } for +@sigs; print "\n ------ \n"; my @adjusted = map{ [ @$_ ] } values %sigs; for my $aref ( @adjusted ) { if( rand > .3 ) { ## delete a pair splice @$aref, rand( @$aref ), 1; } else { ## Adjust a few values; @$aref = map{ rand > .5 and $_->[ 0 ] += -10+int( rand 20 ); rand > .5 and $_->[ 1 ] += -10+int( rand 20 ); $_; } @$aref; } push @sigs, my $sig = calcSig @$aref; $sigs{ $sig } = $aref; } @sigs = sort{ $a <=> $b } @sigs; printf "%9.5f => %s\n", $_, join ' ', map{ @$_ } @{ $sigs{ $_ } } for +@sigs; __DATA__ (67,66), (7,45), (77,19), (4,84), (74,53), (0,65), (37,12), (91,65), ( +84,36) (92,44), (27,30), (94,43), (51,96) (46,57), (45,2), (64,46), (46,1), (40,54) (84,26), (93,43), (11,80), (16,50), (66,80), (46,76) (92,32), (91,28), (80,76), (55,2) (94,40), (90,13), (57,36), (83,3) (13,69), (53,24), (61,90), (56,28), (61,82), (17,85), (12,59), (14,55) (33,44), (1,71), (35,71), (97,26), (39,78), (31,3), (32,65), (87,61) (70,26), (38,71), (78,36), (84,58), (84,5), (81,75) (70,98), (20,25), (79,19), (18,49), (48,78), (50,77), (40,53) (27,32), (54,19), (46,20), (38,93), (61,19), (9,53), (25,1) (43,83), (41,83), (16,69), (99,49), (22,60) (98,42), (79,0), (72,26), (95,11), (89,38) (67,0), (57,24), (46,50), (44,38) (76,20), (41,22), (93,64), (74,92) (82,24), (88,16), (49,9), (63,19), (84,77), (25,90), (48,11), (69,45), + (30,88) (53,23), (72,95), (27,29), (94,59), (18,22), (89,33), (57,97), (63,95) (20,43), (25,36), (33,84), (63,11), (15,2), (70,69), (47,77), (67,91), + (96,32) (59,55), (4,88), (74,63), (20,33), (28,34), (3,93), (9,19), (10,22), ( +42,44) (8,33), (21,55), (51,48), (6,20), (52,63) (72,27), (30,52), (37,35), (7,76), (23,45), (71,25), (75,17), (86,29), + (95,30) (51,38), (31,65), (62,4), (37,35), (73,48) (98,33), (3,22), (75,40), (64,18), (91,67), (8,65), (29,68), (11,96) (45,32), (65,80), (58,22), (95,44), (51,36), (93,61), (76,94) (55,90), (88,6), (9,1), (56,45), (26,56) (26,21), (25,46), (46,14), (79,11), (85,39), (30,9) (37,18), (90,23), (38,13), (97,63), (17,26), (41,56), (22,85) (13,42), (79,34), (5,47), (25,20), (78,35), (59,75), (89,42), (0,26), +(48,36) (60,73), (51,45), (95,63), (98,28) (91,84), (9,40), (13,98), (88,60), (17,83)
In reply to Re: Interesting problem fitting points to a template (algorithm?)
by BrowserUk
in thread Interesting problem fitting points to a template (algorithm?)
by doowah2004
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |