in reply to Interesting problem fitting points to a template (algorithm?)

The following code and output attempts to demonstrate that it is possible to generate a single numerical "signature" or digest for your sets of points that will cause similar sets of points to sort close together. By generating these digests for each set in your database and then generating one for your new film as you receive it you can use a binary search/tree to locate the insertion point into the database, and so find those that have a similar signature. You then do a more thorough spacial comparison of a few either side to locate the best matches.

This is not an exact science as the following output shows.

2760.43800 => 8 33 21 55 51 48 6 20 52 63 2766.50111 => 59 55 4 88 74 63 20 33 28 34 3 93 9 19 10 22 42 44 3587.61500 => 13 69 53 24 61 90 56 28 61 82 17 85 12 59 14 55 3714.33857 => 27 32 54 19 46 20 38 93 61 19 9 53 25 1 4359.73000 => 91 84 9 40 13 98 88 60 17 83 4400.39667 => 13 42 79 34 5 47 25 20 78 35 59 75 89 42 0 26 48 36 4420.68800 => 43 83 41 83 16 69 99 49 22 60 4437.52375 => 33 44 1 71 35 71 97 26 39 78 31 3 32 65 87 61 4642.57000 => 70 98 20 25 79 19 18 49 48 78 50 77 40 53 4680.39600 => 55 90 88 6 9 1 56 45 26 56 4737.51125 => 98 33 3 22 75 40 64 18 91 67 8 65 29 68 11 96 4820.32000 => 46 57 45 2 64 46 46 1 40 54 4844.49444 => 20 43 25 36 33 84 63 11 15 2 70 69 47 77 67 91 96 32 4849.23333 => 26 21 25 46 46 14 79 11 85 39 30 9 4885.40571 => 37 18 90 23 38 13 97 63 17 26 41 56 22 85 4900.49444 => 67 66 7 45 77 19 4 84 74 53 0 65 37 12 91 65 84 36 5080.38000 => 51 38 31 65 62 4 37 35 73 48 5266.59167 => 84 26 93 43 11 80 16 50 66 80 46 76 5350.28000 => 67 0 57 24 46 50 44 38 5511.37333 => 72 27 30 52 37 35 7 76 23 45 71 25 75 17 86 29 95 30 5912.56625 => 53 23 72 95 27 29 94 59 18 22 89 33 57 97 63 95 5977.42111 => 82 24 88 16 49 9 63 19 84 77 25 90 48 11 69 45 30 88 6599.53250 => 92 44 27 30 94 43 51 96 6899.52714 => 45 32 65 80 58 22 95 44 51 36 93 61 76 94 7100.49500 => 76 20 41 22 93 64 74 92 7250.45167 => 70 26 38 71 78 36 84 58 84 5 81 75 7599.52250 => 60 73 51 45 95 63 98 28 7950.34500 => 92 32 91 28 80 76 55 2 8100.23000 => 94 40 90 13 57 36 83 3 8660.23400 => 98 42 79 0 72 26 95 11 89 38 ------ 2760.43800 => 8 33 21 55 51 48 6 20 52 63 2766.50111 => 59 55 4 88 74 63 20 33 28 34 3 93 9 19 10 22 42 44 3000.54000 => 59 55 4 88 74 63 20 33 28 34 3 93 10 22 42 44 3300.49750 => 8 33 21 55 51 48 52 63 3512.61375 => 15 76 53 28 52 90 49 19 69 79 13 85 16 59 14 55 3587.61500 => 15 76 53 28 52 90 49 19 69 79 13 85 16 59 14 55 3714.33857 => 21 32 62 19 51 20 30 92 69 9 13 53 25 8 3871.33286 => 21 32 62 19 51 20 30 92 69 9 13 53 25 8 4224.53625 => 25 41 1 69 40 77 97 26 38 78 26 3 32 74 79 61 4239.70000 => 91 77 9 32 13 98 88 63 11 80 4250.50167 => 20 25 79 19 18 49 48 78 50 77 40 53 4359.73000 => 91 77 9 32 13 98 88 63 11 80 4400.39667 => 13 42 79 34 5 47 25 20 78 35 59 75 89 42 0 26 48 36 4420.68800 => 43 83 41 83 16 69 99 49 22 60 4437.52375 => 25 41 1 69 40 77 97 26 38 78 26 3 32 74 79 61 4600.23333 => 26 21 25 46 37 14 79 11 85 39 24 9 4642.57000 => 70 98 20 25 79 19 18 49 48 78 50 77 40 53 4657.39286 => 37 18 90 14 31 13 91 63 14 26 41 56 22 85 4680.39600 => 55 90 88 6 9 1 56 45 26 56 4700.52875 => 98 32 -6 22 75 48 64 18 91 67 8 70 29 77 17 89 4737.51125 => 98 32 -6 22 75 48 64 18 91 67 8 70 29 77 17 89 4740.30600 => 46 50 41 2 64 46 46 1 40 54 4820.32000 => 46 50 41 2 64 46 46 1 40 54 4844.49444 => 20 43 25 36 33 84 63 11 15 2 70 69 47 77 67 91 96 32 4849.23333 => 26 21 25 46 37 14 79 11 85 39 24 9 4880.39200 => 41 38 31 65 62 -5 37 41 73 57 4885.40571 => 37 18 90 14 31 13 91 63 14 26 41 56 22 85 4887.38750 => 13 42 79 34 25 20 78 35 59 75 89 42 0 26 48 36 4900.37333 => 57 24 46 50 44 38 4900.49444 => 67 74 7 38 77 19 4 91 74 53 0 65 42 18 100 57 84 30 5055.49444 => 67 74 7 38 77 19 4 91 74 53 0 65 42 18 100 57 84 30 5080.38000 => 41 38 31 65 62 -5 37 41 73 57 5125.38375 => 72 27 30 52 37 35 7 76 23 45 71 25 75 17 95 30 5125.68750 => 43 83 41 83 99 49 22 60 5137.51125 => 20 43 33 84 63 11 15 2 70 69 47 77 67 91 96 32 5200.35500 => 55 90 88 6 9 1 56 45 5266.59167 => 84 26 93 43 11 80 16 50 66 80 46 76 5350.28000 => 67 0 57 24 46 50 44 38 5511.37333 => 72 27 30 52 37 35 7 76 23 45 71 25 75 17 86 29 95 30 5912.56625 => 53 23 72 95 27 29 94 59 18 22 89 33 57 97 63 95 5977.42111 => 82 24 88 16 49 9 63 19 84 77 25 90 48 11 69 45 30 88 6000.61000 => 84 26 93 43 11 80 66 80 46 76 6124.46000 => 82 24 88 16 49 9 63 19 84 77 25 90 69 45 30 88 6500.61571 => 53 23 72 95 27 29 94 59 89 33 57 97 63 95 6599.53250 => 92 44 27 30 94 43 51 96 6899.52714 => 45 32 74 80 58 22 95 50 55 36 93 58 74 94 7000.35333 => 76 20 41 22 93 64 7057.53143 => 45 32 74 80 58 22 95 50 55 36 93 58 74 94 7100.39000 => 92 44 27 30 94 43 7100.49500 => 76 20 41 22 93 64 74 92 7139.47000 => 70 26 38 71 84 58 84 5 81 75 7250.45167 => 70 26 38 71 78 36 84 58 84 5 81 75 7599.52250 => 52 73 58 37 95 54 103 28 7700.48000 => 52 73 58 37 95 54 103 28 7849.35000 => 88 39 91 29 80 70 55 2 7950.34500 => 88 39 91 29 80 70 55 2 8100.23000 => 94 40 90 13 57 36 83 3 8660.23400 => 98 47 88 0 72 16 103 11 89 38 8900.18667 => 94 40 90 13 83 3 9000.22400 => 98 47 88 0 72 16 103 11 89 38

The first set of data (above the ----- line) are the signatures and related datasets read from the DATA section of the code.

The second set of data includes all the first set again, plus the same datasets with modifications made, sorted by sugnature. In some I have deleted one or more pairs, other I've adjusted some values at random by +/-10.

The output show that the algorithm I'm using (the third I tried) does a fairly good job of grouping the modified dataset close or adjacent to the original it is derived from.

It's not perfect. With more tweaking and testing it should be possible to adjust it to do a better (more sensitive) job, but that requires a detailed knowledge (and sample datasets) to do properly.

The basic approach (see calcSig()) is :

  1. Produce normalised (0-1) value for each X & Y.
  2. Sum all the normalised Xs and divide by the number of pairs to produce a "normalised average X".
  3. Do the same for the Ys.
  4. Multiply the X average by a factor and int to produce an integer large enough to have enough significant digits to distribute the X signatures nicely.

    This should probably be increased slightly.

  5. Add the normalised average Y value to produce the floating point signature.

    These last two step are prime candidates for improvement as this causes the signature to favour the Xs. The idea is to find an algorithm that distributes the datasets over a wide hash space fairly evenly without wrapping. I have achieved that here yet.

Here's the code. It's not documented. If you are interested in pursuing the idea further and have questions, fell free to ask.

#! perl -slw use strict; use Data::Dumper; use List::Util qw[ reduce sum ]; use constant { MAXX => 100, MAXY => 100, }; sub calcSig { my $sum = reduce { $a->[ 0 ] += $b->[ 0 ] / MAXX; $a->[ 1 ] += $b->[ 1 ] / MAXY; $a; } [], @_; $sum->[ 0 ] /= @_; $sum->[ 1 ] /= @_; return int( $sum->[ 0 ] * MAXX * MAXY ) + $sum->[ 1 ] } my( @sigs, %sigs ); while( <DATA> ) { chomp; my @points; push @points, [ $1, $2 ] while m[\((\d+?),(\d+?)\)(?:,\s|$)]g; # print Dumper \@points; push @sigs, my $sig = calcSig @points; $sigs{ $sig } = \@points; } @sigs = sort{ $a <=> $b } @sigs; printf "%9.5f => %s\n", $_, join ' ', map{ @$_ } @{ $sigs{ $_ } } for +@sigs; print "\n ------ \n"; my @adjusted = map{ [ @$_ ] } values %sigs; for my $aref ( @adjusted ) { if( rand > .3 ) { ## delete a pair splice @$aref, rand( @$aref ), 1; } else { ## Adjust a few values; @$aref = map{ rand > .5 and $_->[ 0 ] += -10+int( rand 20 ); rand > .5 and $_->[ 1 ] += -10+int( rand 20 ); $_; } @$aref; } push @sigs, my $sig = calcSig @$aref; $sigs{ $sig } = $aref; } @sigs = sort{ $a <=> $b } @sigs; printf "%9.5f => %s\n", $_, join ' ', map{ @$_ } @{ $sigs{ $_ } } for +@sigs; __DATA__ (67,66), (7,45), (77,19), (4,84), (74,53), (0,65), (37,12), (91,65), ( +84,36) (92,44), (27,30), (94,43), (51,96) (46,57), (45,2), (64,46), (46,1), (40,54) (84,26), (93,43), (11,80), (16,50), (66,80), (46,76) (92,32), (91,28), (80,76), (55,2) (94,40), (90,13), (57,36), (83,3) (13,69), (53,24), (61,90), (56,28), (61,82), (17,85), (12,59), (14,55) (33,44), (1,71), (35,71), (97,26), (39,78), (31,3), (32,65), (87,61) (70,26), (38,71), (78,36), (84,58), (84,5), (81,75) (70,98), (20,25), (79,19), (18,49), (48,78), (50,77), (40,53) (27,32), (54,19), (46,20), (38,93), (61,19), (9,53), (25,1) (43,83), (41,83), (16,69), (99,49), (22,60) (98,42), (79,0), (72,26), (95,11), (89,38) (67,0), (57,24), (46,50), (44,38) (76,20), (41,22), (93,64), (74,92) (82,24), (88,16), (49,9), (63,19), (84,77), (25,90), (48,11), (69,45), + (30,88) (53,23), (72,95), (27,29), (94,59), (18,22), (89,33), (57,97), (63,95) (20,43), (25,36), (33,84), (63,11), (15,2), (70,69), (47,77), (67,91), + (96,32) (59,55), (4,88), (74,63), (20,33), (28,34), (3,93), (9,19), (10,22), ( +42,44) (8,33), (21,55), (51,48), (6,20), (52,63) (72,27), (30,52), (37,35), (7,76), (23,45), (71,25), (75,17), (86,29), + (95,30) (51,38), (31,65), (62,4), (37,35), (73,48) (98,33), (3,22), (75,40), (64,18), (91,67), (8,65), (29,68), (11,96) (45,32), (65,80), (58,22), (95,44), (51,36), (93,61), (76,94) (55,90), (88,6), (9,1), (56,45), (26,56) (26,21), (25,46), (46,14), (79,11), (85,39), (30,9) (37,18), (90,23), (38,13), (97,63), (17,26), (41,56), (22,85) (13,42), (79,34), (5,47), (25,20), (78,35), (59,75), (89,42), (0,26), +(48,36) (60,73), (51,45), (95,63), (98,28) (91,84), (9,40), (13,98), (88,60), (17,83)

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Replies are listed 'Best First'.
Re^2: Interesting problem fitting points to a template (algorithm?)
by doowah2004 (Monk) on Oct 06, 2004 at 21:04 UTC
    Sorry to take so long to reply, I was working through the code and thinking about implementation. Nifty idea, but not I do not believe practical for my situation. I have 7 ROIs, each could contain as many as 10 points, and of those ten the true point may or may not exist. So coming up with all of the possible combinations/permutations could make this solution prohibitive.

    I have been thinking about filtering by point instead of collection (at least initially). For good registration, I need at least 4 points. Knowing the distance between the points I can work through each of the points and check the distance between each of the other points. If I set some tolerance, then I can discriminate the points that are not within the tolereance for at least 4 of the 6 distances. This should be fairly quick { 6 distances per point max ~ 10 points per ROI => 420 checks, and if I splice the array for the ROI then this number should decrease as bad points are thrown out. }


    If I can run a few tests then I will see what we typically have left. If it is only 1-3 points per ROI then your solution is much more viable.


    Quick question (I have not looked for this yet, thought of it while writting), do you or does anyone know of a good module for doing the permutations I need? i.e. N arrays with M elements, creating new unique arrays containing only 1 element from each of the N arrays? I am sure that it is out there, I will look for it tonight.

    Thanks for you help,


    Cameron
      ...does anyone know of a good module for doing the permutations...

      Algorithm::Permute and NextPermute() from tye's Algorithm::Loops module are both very good implementations.

      I think that the idea of generating a signature could be made to work well for your application. I saw the extra information that you posted in reply to tachyon and I think I see the problems your describing, and also some of a possible solution. The key to taking the thoughts further is having some test data to work with. If you felt like placing some on your doowah2004's scratchpad, and /msg'ing me, I enjoy playing with it.

      That said, I'm still not entirely sure that I understand the full scope of the problem, so maybe I should just accept your more informed judgement on the matter :)


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon