I'd consider adding an initial screen that uses a much faster algorithm. If a point passes the screen, then the point_in_polygon routine could be called (for those that don't already own it, the Wolf Book is Mastering Algorithms with Perl). Both of the approaches I describe are similar to your Maine/California example, and both would work well with a database to store the results and avoid repeat calculations.

The first approach would be to partition your space into grids (the size of which should be determined based on how tightly your polygons and points are clustered). Determine which partition the polygons and points are in, then for each partition only test that set of polygons and points.

The second approach is a little more complex, but it might be worthwhile if your polygons and points are tightly clustered and the grid partition method only minimally reduces the calculations. Circumscribe a circle around each polygon1 and store the center point and radius for the circle. The initial screen then becomes a very simple (and fast) comparison of the distance between the center of the circle and the point in question, vs the radius of the circle. This method won't work as well if the polygons are very irregularly-shaped, as you'll get a lot of points that fall into the crevices2.

Alternatively, you could look into coding the algorithm in C (using perlxs) or PDL.

HTH

1Update 1: Not all polygons can be circumscribed in according to the true definition of "circumscribe", which states that every vertex of the polygon must be on the circle. For the purposes of this problem, I intended to suggest that any circle large enough to encompass the polygon could be used. This could be determined by trial and error using unit increments to the radius, or it could be determined more precisely by finding the largest distance between two vertices (which becomes the diameter of the circle, therefore the center is the midpoint of that segment).

Please note that there is room for error in this approach unless every point of the polygon is tested to ensure they are all contained within the circle. Nonetheless, it may be good enough for a first pass, and a suitable margin of error be used (doubling the size of the circle, etc) to minimize the chance for error.

2Update 2: NiJo had a great suggestion - instead of using a circle as I initially proposed, use a rectangle. The bounds would be much easier to calculate and it would likely function just as well (if not better, since it would be guaranteed to contain the entire polygon) for a first pass screen.


In reply to Re: Speeding up point-in-polygon by bobf
in thread Speeding up point-in-polygon by punkish

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.