in reply to •Re: Re: Merge/Purge address data
in thread Merge/Purge address data

Realistically, how many addresses does that happen? 500-1000, in each city? Keep in mind that we are looking for duplicates, so all those exceptions would have to happen plus the match of the same last name (at least that would be a strong match point in my calculation), with the worst thing that could happen is that you don't send your pretty newsletter or mail order catalog to one of the two individuals.

In this instance program for the obvious and let the exceptions fall wherever. It is not worth the programming time or effort for the 100,000 exception addresses in the millions of US addresses.

g_White
  • Comment on Re: •Re: Re: Merge/Purge address data

Replies are listed 'Best First'.
•Re: Re: •Re: Re: Merge/Purge address data
by merlyn (Sage) on Nov 11, 2003 at 21:20 UTC
    I'm pretty sure there's a lot more than 1000 houses in the Seattle area. You can't just move the NE designator around. It's not the same address any more.

    So, the position of every piece of it is important. I would be mad if you had "normalized" my 0333 SW Flower address to "333 SW Flower". And yes, it happens, and it's still wrong.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Yes Randal, there are more than 1000 houses in Seattle, but probably not a lot more than a 1000 with an exact duplicate address and street with the exception being a leading zero. AND I never recommended normalizing the address that was sent to the user, the normalization and extraction is for merge/purge process only, you _should_ always keep the orginal input as the actual address you stick on the mailing label.

      g_White
        Ooops, you're confusing Portland and Seattle. There are tons of houses in seattle where moving the NE will break things. In Portland, the number of leading-0 addresses is more like 500 or so, so that's a smaller problem, but still a problem.

        But the real problem is that you cannot coalesce like this. You cannot know that "123D Main" is really the same as "123 Main, Apt D". They might not be. And if you join them, you might break things.

        -- Randal L. Schwartz, Perl hacker
        Be sure to read my standard disclaimer if this is a reply.