in reply to Speeding up point-in-polygon

I'd consider adding an initial screen that uses a much faster algorithm. If a point passes the screen, then the point_in_polygon routine could be called (for those that don't already own it, the Wolf Book is Mastering Algorithms with Perl). Both of the approaches I describe are similar to your Maine/California example, and both would work well with a database to store the results and avoid repeat calculations.

The first approach would be to partition your space into grids (the size of which should be determined based on how tightly your polygons and points are clustered). Determine which partition the polygons and points are in, then for each partition only test that set of polygons and points.

The second approach is a little more complex, but it might be worthwhile if your polygons and points are tightly clustered and the grid partition method only minimally reduces the calculations. Circumscribe a circle around each polygon1 and store the center point and radius for the circle. The initial screen then becomes a very simple (and fast) comparison of the distance between the center of the circle and the point in question, vs the radius of the circle. This method won't work as well if the polygons are very irregularly-shaped, as you'll get a lot of points that fall into the crevices2.

Alternatively, you could look into coding the algorithm in C (using perlxs) or PDL.

HTH

1Update 1: Not all polygons can be circumscribed in according to the true definition of "circumscribe", which states that every vertex of the polygon must be on the circle. For the purposes of this problem, I intended to suggest that any circle large enough to encompass the polygon could be used. This could be determined by trial and error using unit increments to the radius, or it could be determined more precisely by finding the largest distance between two vertices (which becomes the diameter of the circle, therefore the center is the midpoint of that segment).

Please note that there is room for error in this approach unless every point of the polygon is tested to ensure they are all contained within the circle. Nonetheless, it may be good enough for a first pass, and a suitable margin of error be used (doubling the size of the circle, etc) to minimize the chance for error.

2Update 2: NiJo had a great suggestion - instead of using a circle as I initially proposed, use a rectangle. The bounds would be much easier to calculate and it would likely function just as well (if not better, since it would be guaranteed to contain the entire polygon) for a first pass screen.

Replies are listed 'Best First'.
Re^2: Speeding up point-in-polygon
by Anonymous Monk on Jul 27, 2006 at 07:29 UTC
    I would spend a lot of time pre-processing to cut down on the amount of looping you have to do.

    In describing time complexity I'll use M for the number of points you have and N for the number of rectangles.

    First, wrap every polygon in a rectangle as described in this tip. This is O(n).

    Then, create four lists of rectangles sorted by xMin, xMax, yMin, and yMax. This is O(n log n) if you use quicksort.

    Now, you cluster the rectangles into super-rectangles. How you do this is up to you, and should probably be informed by your data. If its evenly distributed you can do this in essentially O(n) time by simply using a grid system.

    If it is not evenly distributed, you do something like the following recursively: for any super-rectangle, roll randomly from the set of xMin, xMax, yMin, and yMax. Go to the median of the list you just selected (this should be constant time if your lists are backed by arrays). Draw a line there separating half of the points on one side of the line and half on the other. i.e. if you picked yMax you draw a horizontal line there. Then you use an O(n) loop to assign rectangles to either or both of the new slightly-less-super rectangles. If a rectangle contains more than some arbitrary threshhold of your total number of rectangles, you recursively subdivide it. Otherwise, you stop that level of the recursion and add that superrectangle to your superrectangle list. (Note, you'll want a sanity check on your recursion depth here, as it is possible but unlikely in practice that this will loop forever otherwise)

    So now you've got a list of super-rectangles, each of which has a list of sub-rectangles, each of which is associated with a polygon. Following so far? Now sort the list of super-rectangles by number of rectangles (smallest to largest), and sort each list of sub-rectangles by area (largest to smallest). The goal of both of these sorts is to maximize the probability that you will quickly find one qualifying polygon and be able to kill the search, skipping huge swathes of your M*N search space.

    Then, and only then, you get to (forgive non-Perl pseudocode),

    for(foo in points) for(bar in superRectangles) if (bar contains foo) for (foobar in subRectangles of bar) if (foobar contains foo) if (polygon of foobar contains foo) do whatever loop to next point