in reply to Duplicate (similarity) detection (SQL)

Some people have suggested normalizing the DB, but I believe the question is not finding duplicates in a database, however just similarities. I.E. Sales: Customer calls and uses Bob as his first name instead of Robert, or gives a different office number but at the same address. The sales person should be able to key in his current information, and see that there are a few possible matches so they can update his old information rather than entering new information and having Bob and Robert as multiple entries in the DB. So after all that I don't have a solution, but I'd be interested in what everyone else comes up with.

~Erich

Replies are listed 'Best First'.
Re: Re: Duplicate detection (SQL)
by zachlipton (Beadle) on Oct 22, 2003 at 18:01 UTC
    I agree. Calling this node "duplicate detection" was actually a bit of a misnomer (duplicate is the term we use in our system to describe records that are similar enough that they should be marked as duplicate) so I renamed it to "similarities detection."

    I'm not sure if this will be possible to do at all because running a Search::Similarities search on every entry in the db (there could be thousands in a large database) against the target query would likely take far too long. I may try it in a quick spike solution and see if it's reasonable, but does anyone else have any better ideas?