in reply to Searching for best match

How would you proceed to find the best match?

That depends somewhat on your definition of "best", and some more explanation and examples would help, but I'm going to guess it's the longest one?

for my $s (@search) { my @names = split ' ', $s; # this reqires at least first two names to match my @matches = grep { /^\Q$names[0]\E\s+\Q$names[1]\E\b/ } @source; @matches = sort {length($b)<=>length($a)} @matches; print "search='$s'\n"; print "\tfound='$_'\n" for @matches; } __END__ search='John Ronald Reuel T' found='John Ronald Reuel Tolkien' found='John Ronald S Tolkien' search='Trent Reznor' found='Trent Reznor' search='Barack Hussein II' found='Barack Hussein Obama II' found='Barack Hussein II' search='Barack Hussein Obama II' found='Barack Hussein Obama II' found='Barack Hussein II' search='No match here'

Just a note, using \w+ to match a name may not be enough, since it might not include all the characters you would consider part of a name (for example, in ASCII it doesn't include the dot, as in "Jr." or "Sr."). That's why the code above takes the alternative approach of splitting on whitespace. However, even that might not be enough, and you should probably look into the Lingua:: namespace on CPAN. For example, a quick search brings up Lingua::EN::MatchNames and Lingua::EN::NameParse.

Replies are listed 'Best First'.
Re^2: Searching for best match
by Sosi (Sexton) on Oct 06, 2014 at 12:57 UTC
    One more thing: even though I used people's names in my example, my real case has no real people names (I'm working with organisms' species in case you're interested), so I can't use Lingua::. It's my fault I chose the wrong example, I'm sorry for that.
Re^2: Searching for best match
by Sosi (Sexton) on Oct 06, 2014 at 12:47 UTC

    Thank you! Yes the best match is the longest one. In the Stackoverflow post someone suggested that I looked into fuzzy-matching modules. I'm also looking into this.

      It's a little unclear to me if Text::Fuzzy does what you want, but of course investigating CPAN modules is a good idea.

      Also, just a note that the code above is only an interpretation if what your original code appears to want to do, i.e. looking at only the first two names for matches.

      A more complete selection of sample input, description of what you want the match to be, and sample output would really help, I think.