in reply to CSV Cross Referencing

I suspect hippo's suggestion that you use a RDBMS' native capabilities is a better idea than this regex alternate... but nonetheless:

  1. if the (supposedly matching) lat/lon values are inconsistent by no worse than 9, simply compare the initial six digits of each lat and first five digits of the lon (in each case with an unspecified trailing digit and end_of_string, $, marker). That could produce false positives, but if my fingers-and-toes math is correct, the addresses would have to be fone-booth size properties to match.
  2. Similarly, but using your variance of 10, use five digits of lat and four of lon with two unspecified trailing difits. Caution: this is going to be a lot more squiggly.

Replies are listed 'Best First'.
Re^2: CSV Cross Referencing
by hippo (Archbishop) on Dec 03, 2014 at 12:34 UTC

    It is a nice thought, but I can see a problem with false negatives too. Suppose the lat in one set is 5000000 and in the other it's 4999998 - those are not going to be found using pattern-matching. You would have to both add and subtract the maximum error and then compare both of those outer limits to the second set. It's do-able in Perl but the RDBMS seems like the better way to me.

      AMEN!

      At some point(s), be it as in your example, or otherwise, there are going to be edge cases that are going to require a lot of effort to resolve.

      Thanks for making an important point I missed.