in reply to Re: •Re: Re: •Re: Re: Merge/Purge address data
in thread Merge/Purge address data

Ooops, you're confusing Portland and Seattle. There are tons of houses in seattle where moving the NE will break things. In Portland, the number of leading-0 addresses is more like 500 or so, so that's a smaller problem, but still a problem.

But the real problem is that you cannot coalesce like this. You cannot know that "123D Main" is really the same as "123 Main, Apt D". They might not be. And if you join them, you might break things.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

  • Comment on •Re: Re: •Re: Re: •Re: Re: Merge/Purge address data

Replies are listed 'Best First'.
Re: Merge/Purge address data
by gwhite (Friar) on Nov 12, 2003 at 10:53 UTC

    I am not advocating moving anything. To determine duplicates in a mail list, break it into its individual parts, score the matching parts, if your score is above the threshold assume a duplicate, save one of your original inputs (not the remix of parts).
    You cannot know that "123D Main" is really the same as "123 Main, Apt D".
    You cannot know based on that info only, but if I also have a matching zip, matching last name, matching first name, I may _choose_ to say that is a match, expecially if I am sending an expensive 4 color catalog at 80 cents postage per catalog. If I am sending a presorted one color postcard at the lowest rate (20 something cents I think), maybe I choose to say it is not a duplicate.

    g_White