in reply to dealing with colloquial forms of people's first names

I wonder if this is an XY problem. Do you have a list of people, and you think some of them are duplicates (i.e., "James Brown" and "Jim Brown")? In that case, I don't think you can be sure that they're really duplicated and not just two people with similar names (or even identical names). If you don't have some other way to uniquely identify them, you're just guessing.

Update: For a practical example of this problem in action, see here.

  • Comment on Re: dealing with colloquial forms of people's first names

Replies are listed 'Best First'.
Re^2: dealing with colloquial forms of people's first names
by Anonymous Monk on Feb 01, 2008 at 02:01 UTC
    First of all thank you for your time! I have a list of names of managers of U.S. mutual funds by fund and year. The problem is that when fund secretaries submitted entries to the database (I am imagining), one year they write "Jim" Last and next year "James" Last. Also, sometimes they make simply a typo: "Jin" Last. (Often typo's are e->c - I would guess that an OCR read the printouts or scans?). It is very unlikely that there would be two fund managers among about 10,000 fund managers with exactly same name, especially when their last name is not a common one. Sometimes there are such cases - for example "Jr." and "Sr." in family-managed funds, but I am either aware of these cases, or can tolerate some errors. Typos and nicknames are much more often and constitute a much bigger problem. Thanks again.