Re: Searching for best match

How would you proceed to find the best match?

That depends somewhat on your definition of "best", and some more explanation and examples would help, but I'm going to guess it's the longest one?

for my $s (@search) {
  my @names = split ' ', $s;
  # this reqires at least first two names to match
  my @matches = grep { /^\Q$names[0]\E\s+\Q$names[1]\E\b/ } @source;
  @matches = sort {length($b)<=>length($a)} @matches;
  print "search='$s'\n";
  print "\tfound='$_'\n" for @matches;
}
__END__
search='John Ronald Reuel T'
    found='John Ronald Reuel Tolkien'
    found='John Ronald S Tolkien'
search='Trent Reznor'
    found='Trent Reznor'
search='Barack Hussein II'
    found='Barack Hussein Obama II'
    found='Barack Hussein II'
search='Barack Hussein Obama II'
    found='Barack Hussein Obama II'
    found='Barack Hussein II'
search='No match here'
[download]

Just a note, using \w+ to match a name may not be enough, since it might not include all the characters you would consider part of a name (for example, in ASCII it doesn't include the dot, as in "Jr." or "Sr."). That's why the code above takes the alternative approach of splitting on whitespace. However, even that might not be enough, and you should probably look into the Lingua:: namespace on CPAN. For example, a quick search brings up Lingua::EN::MatchNames and Lingua::EN::NameParse.

Comment on Re: Searching for best match Select or Download Code

Replies are listed 'Best First'.
Re^2: Searching for best match by Sosi (Sexton) on Oct 06, 2014 at 12:57 UTC
One more thing: even though I used people's names in my example, my real case has no real people names (I'm working with organisms' species in case you're interested), so I can't use Lingua::. It's my fault I chose the wrong example, I'm sorry for that.	[reply]
Re^2: Searching for best match by Sosi (Sexton) on Oct 06, 2014 at 12:47 UTC
Thank you! Yes the best match is the longest one. In the Stackoverflow post someone suggested that I looked into fuzzy-matching modules. I'm also looking into this.	[reply]
Re^3: Searching for best match by Anonymous Monk on Oct 06, 2014 at 15:39 UTC
It's a little unclear to me if Text::Fuzzy does what you want, but of course investigating CPAN modules is a good idea. Also, just a note that the code above is only an interpretation if what your original code appears to want to do, i.e. looking at only the first two names for matches. A more complete selection of sample input, description of what you want the match to be, and sample output would really help, I think.	[reply]