http://qs1969.pair.com?node_id=182642


in reply to Sorting by geographical proximity / clumping groups of items based on X and Y

You could try looking at the density of complaints in an area, even though that could be computationally intensive. It would give you some idea of where the 'clumps' are.

To find the rough edges of the clumps, find the number of complaints in a small radius, then increase the radius and see how many more you get. At some point the program will have to make the call as to whether or not it is worth expanding the circle to include the extra few points, which is a classical calculus problem.

You would also have to include extra rules to stop the circle from increasing forever or shrinking to nothing.

Other techniques might involve randomly (or less than randomly) drawing circles around the centre of the clump and seeing which ones catch the most complaints, then declaring the union of all the circles to be the clump.

____________________
Jeremy
I didn't believe in evil until I dated it.

  • Comment on Re: Sorting by geographical proximity / clumping groups of items based on X and Y

Replies are listed 'Best First'.
Re: Sorting by geographical proximity / clumping groups of items based on X and Y
by Ionizor (Pilgrim) on Jul 18, 2002 at 17:49 UTC

    This solution is similar to one that I was thinking of. I don't have code but the algorithm is like this:

    • Pick a point where there's a complaint.
    • Draw a circle of some arbitrary radius around it (a little experimentation will find the optimum radius, I'm sure).
    • If another point falls within the radius, increase the area of the circle by a fixed amount and recenter it so it lies at the midpoint between the two points.
    • If another point falls inside the circle, increase the area of the circle again and recenter it so the center lies as close to the center of the cluster as possible.
    • Keep increasing the area and recentering as long as more points fall inside the circle.

    Since you're increasing area instead of radius, the circle will stretch less and less each time until it stops growing. If the circle stops growing, record the center of the cluster and all the complaints inside it. Once you have a cluster, move on to a point that isn't part of a cluster yet and go again. If some point lies in two different clusters, make it part of the closer cluster.