Well, I'd take a look at Lingua-EN-MatchNames, first off.
from the description :
You have two databases of person records that need to be synchronized or matched up,
but they use different keys--maybe one uses SSN and the other uses employee id.
The only fields you have to match on are first and last name.
That's what this module is for.
Just feed the first and last names to the C<name_eq()> function, and it returns
C<undef> for no possible match, and a percentage of certainty (rank) otherwise.
The ranking system isn't very scientific, and gender isn't considered, though
it probably should be.
It's got some good examples, and you can set up "fuzzy searching", so you can match, e.g. "G. W. Bush, Jr" to "George Bush Jr" with a degree of confidence.
and if that doesn't work, I'd try Text::soundex or just grepping through the strings.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|