OK I am trying to compare 2 list of names and get matches between the lists, my actual data have variations of names is why I am using MatchNames. Everytime I run though my loop I eat an additional 3MB of Memory and I don't know why. Any Help is Appreciated
use Lingua::EN::MatchNames;
open (TERMFILE, $ARGV[0]);
my(@termusers) = <TERMFILE>;
chomp @termusers;
open (USERFILE, $ARGV[1]);
my(@curusers) = <USERFILE>;
chomp @curusers;
open (DUPFILE, ">dup.$ARGV[1]");
####Lets Create the Hash################
foreach $curuser (@curusers)
{
chomp $curuser;
$curusercounter++;
print "Adding current user $curusercounter $curuser to Array\n
+";
$curlookup{$curusercounter} = $curuser;
}
foreach $termuser (@termusers)
{
chomp $termuser;
$termusercounter++;
print "Adding Term user $termusercounter $termuser to Array\n"
+;
$termlookup{$termusercounter} = $termuser;
}
@termuserlist = keys %termlookup;
@curuserlist = keys %curlookup;
foreach $termusername (@termuserlist)
{
&NameComp($termlookup{$termusername})
}
sub NameComp () {
foreach $curusername (@curuserlist)
{
print "comparing $_[0] to $curlookup{$curusername}\n";
my $name_score = (name_eq($_[0], $curlookup{$curuserna
+me}));
print "$name_score\n";
if ($name_score >= 80){
print "Found Match $curlookup{$curusername}\n"
+;
}
}
}
close (TERMFILE);
close (USERFILE);
close (DUPFILE);
executing with perl -w matcher.pl test.xt test2.txt
test.txt contains the following entries
Robert Forbes
Thomas Forbes
Jane Doe
John Doe
Bad User
test2.txt contains the following
Tom Forbes
Bob Forbes
Janie Doe
Johnny Doe
Wrong User
I am also am getting the following error message
use of uninitialized value in numeric ge (>=) at matcher.pl line 44, <USERFILE> line 5.
Updated Steve_p - changed module mentioned in title from MatchNames.pm to Lingua::EN::MatchNames
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.