in reply to Re: CSV Cross Referencing
in thread CSV Cross Referencing

Did you mean 'coarse'?

As a note; there is no need to sqrt(...) for all N^2 of the distances. Instead, simply square your threshold; that is a much cheaper operation, only needing to be done once.

If you have a large number of coordinates, you could also break the world up into a grid of buckets. Each entry then only needs to check distance to the entries in the nearest four adjacent buckets. Use a 2D hash of buckets since most buckets will be unused/empty out in the countryside.

Replies are listed 'Best First'.
Re^3: CSV Cross Referencing
by RonW (Parson) on Dec 03, 2014 at 18:22 UTC

    Still cheaper (which Tux did in his code), just compare the Lat and Lon separately. Only need to calculate the distance (or square of distance) when there are multiple matches.

      That's essentially what the 2d hash does.

      $buckets->{latitude}{longitude}. You're only checking things with a similar latitude, and among those, only the things with a similar longitude. The upside is you don't need to loop. Instead of doing a latitude compare against everything, you immediately O(1) have the short list of things of the same latitude. Then instead of doing a compare against the longitudes of everything remaining, you immediately have the short list of things that match both.