Re: Efficient Fuzzy Matching Of An Address

I think the mechanism I describe in the subthread starting at Re^3: Comparing text documents could be adapted to this purpose depending upon what your actual goal is?

When you say "Given an address as input, find any "similar" addresses in the DB" your not trying to (for example) locate next door neighbours, but rather locate duplicates with minor typos or transcription errors?

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comment on Re: Efficient Fuzzy Matching Of An Address

Replies are listed 'Best First'.
Re^2: Efficient Fuzzy Matching Of An Address by Limbic~Region (Chancellor) on Aug 19, 2008 at 18:22 UTC
BrowserUk, When you say "Given an address as input, find any "similar" addresses in the DB" your not trying to (for example) locate next door neighbours, but rather locate duplicates with minor typos or transcription errors? The latter (typos and transcription errors) with a caveat. If the input is '123 Main St' and the DB has a record of '356 Main Street', I would hope that the search wouldn't return "no results found". On the other hand, if the input was '12 Elm Ave' (two houses over), then the search should definately not find a match. Ultimately, I would like to return the best N matches ordered by degree of similarity. Thanks for the pointer to Re^3: Comparing text documents. If I get real motivated, I will try it out. Cheers - L~R	[reply]

Replies are listed 'Best First'.

Re^2: Efficient Fuzzy Matching Of An Address
by Limbic~Region (Chancellor) on Aug 19, 2008 at 18:22 UTC

BrowserUk

When you say "Given an address as input, find any "similar" addresses in the DB" your not trying to (for example) locate next door neighbours, but rather locate duplicates with minor typos or transcription errors?

The latter (typos and transcription errors) with a caveat. If the input is '123 Main St' and the DB has a record of '356 Main Street', I would hope that the search wouldn't return "no results found". On the other hand, if the input was '12 Elm Ave' (two houses over), then the search should definately not find a match. Ultimately, I would like to return the best N matches ordered by degree of similarity.

Thanks for the pointer to Re^3: Comparing text documents. If I get real motivated, I will try it out.

Cheers - L~R

[reply]