in reply to Re: Merge/Purge address data
in thread Merge/Purge address data

the street line would get broken down(box, street_name, direction, type, unit_number, unit_type)
Be careful here. In Seattle (and I think DC and a few other areas), the street of "123rd Street NE" is perpendicular to "NE 123rd Street". That is, the "NE" modifier moves after or before the street name to show whether it's a north-south street or an east-west street. Yes, you can live at the intersection of those two streets!

And, my former business address was 0333 SW Flower St, to distinguish it from 333 SW Flower St. The "0" in front essentially means "negative", so the "03" block is east of the "02" block, then "01" block then "1" block, "2" block, and the "3" block where the other address was, about six blocks west. So you can't just rip leading 0's either.

So, be careful not to oversimplify.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

Replies are listed 'Best First'.
Re: •Re: Re: Merge/Purge address data
by gwhite (Friar) on Nov 11, 2003 at 21:16 UTC

    Realistically, how many addresses does that happen? 500-1000, in each city? Keep in mind that we are looking for duplicates, so all those exceptions would have to happen plus the match of the same last name (at least that would be a strong match point in my calculation), with the worst thing that could happen is that you don't send your pretty newsletter or mail order catalog to one of the two individuals.

    In this instance program for the obvious and let the exceptions fall wherever. It is not worth the programming time or effort for the 100,000 exception addresses in the millions of US addresses.

    g_White
      I'm pretty sure there's a lot more than 1000 houses in the Seattle area. You can't just move the NE designator around. It's not the same address any more.

      So, the position of every piece of it is important. I would be mad if you had "normalized" my 0333 SW Flower address to "333 SW Flower". And yes, it happens, and it's still wrong.

      -- Randal L. Schwartz, Perl hacker
      Be sure to read my standard disclaimer if this is a reply.

        Yes Randal, there are more than 1000 houses in Seattle, but probably not a lot more than a 1000 with an exact duplicate address and street with the exception being a leading zero. AND I never recommended normalizing the address that was sent to the user, the normalization and extraction is for merge/purge process only, you _should_ always keep the orginal input as the actual address you stick on the mailing label.

        g_White