Hmmm ... well probably not faster but at least more accurate ...
If they're US addresses (and zip code makes me think so), you could use the USPS web service for this. They're are limits (5 requests per transaction) and it's going to be slow -- but at least they'll be correct (especially if you're goal is to *use* the address data to send mail!).
Once the addresses are standardarized -- I would then create a new table where contact_name is not part of the unique constraint and see what happens when you load the data. If it appears the names are mis-spelled or truncated or typo-ed, well, then your biggest problem is which one to choose. If there are multiple distinct names per address and you wish to keep those then I would add them back in *after* the initial load (and after altering the table to put contact_name back in as a unique constraint).
In reply to Re: Question: practical way to find dupes in a large dataset
by derby
in thread Question: practical way to find dupes in a large dataset
by lihao
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |