Honestly I think that a huge part of this problem could be solved by using the Database as effectively as possible. I am hoping that you are using a standard DB such as Oracle, MySQL, etc. The info, then, that you want to search through can be sorted using an "ORDER BY" statement on the address number and street name. This will give a sorted list that will basically group all the dupes next to each other.
As for the rest, all you need to do is match address (number + street name) and make a hash or some other data structure to store the accepted spellings of common street names or other address conventions. (such ast st. blvd. etcetera). You can have the program run through this hash and transform them to a common output and you should get duplicate output, with different primary keys.
This is a general overview of your problem. There are obviously going to be some edge cases come up when you tackle this problem. This is quite a large problem, but I think that if you are able to have the DB work for you it will simplify things tremendously.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.