http://qs1969.pair.com?node_id=182667


in reply to Sorting by geographical proximity / clumping groups of items based on X and Y

Questions lke this tend to require some defintion of "proximity" which is rarely as simple as "N points within a polygon or area M" ... typically people want to find trends -- 3 problems within 1 grid square is a trend, 1 is not -- BUT! 1 per grid square for 10x10 grid squares is.

An algorithm that seems like it would work well, assumes that you have a range of "region sizes" that can overlap, and difffernet thresolds for the number of problems that constitute a "hot spot region". In psuedo code...

%grid_size_thresholds = ( 1 => 3, # a 1x1 square grid is hot if it has 3 probs 2 => 10,# a 2x2 square grid is hot if it has 10 probs 3 => 30,# ... ... ) @hotspots = (); foreach $size (sort keys %grid_size_thresholds) { for ($x = 0; $x + $size < $GRID_WIDTH; $x++) { for ($y = 0; $y + $size < $GRID_HEIGHT; $y++) { $region = new Region($x, $x + $size, $y, $y + $size); $prob_count = get_num_probs_in_region($region); if ($grid_size_thresholds{$size} <= $prob_count) { push @hotspots, $region; } } } }
You now have a list of (possibly nested) square regions of various sizes which identify hotspots. You can eliminate regions which are completely contained by other regions in the list, or you could use the info to find "hot spots within warm spots" (ie: "region 1,4;2,3 has had enough problems to deserve attention, and within that 1,2;2,3 desrves special attention)

  • Comment on Re: Sorting by geographical proximity / clumping groups of items based on X and Y
  • Download Code