in reply to Guessing/Ordering Partial Data

Here's a pretty simple way to score the matching. Convert the array of search terms to a regex alternation and count the number of matches for each datum:

my @guesses = ('Place De La Gare', 'Rennes'); my $grei = do { # this case insensitive re added local $" = '|'; qr/@{[map {quotemeta} @guesses]}/i; }; my $gre = do { local $" = '|'; qr/@{[map {quotemeta} @guesses]}/; }; my %score; for (@data) { $score{$_} = () = m/($grei)/g; # edited. # $score{$_} += () = m/($gre)/g; # uncomment to give extra # credit for exact match } # sort keys by value or grep for threshold to pick best matches

Update: Case insensitivity was being overruled by the compiled $gre. Repaired. Got rid of non-capture grouping and added quotemeta to defang special characters in the data. More: Added the '()=" trick to force array context - that fixes the counts.

After Compline,
Zaxo

Replies are listed 'Best First'.
Re^2: Guessing/Ordering Partial Data
by ropey (Hermit) on Apr 13, 2005 at 04:11 UTC

    I Like Zaxo

    Output after running and using Data::Dumper is

    $VAR1 = { 'Place de la Gare - Bergerac' => 1, 'Place De La Gare - Angers' => 0, 'Place Thiers - Nancy' => 0, 'Place De La Gare - Rennes' => 1, 'Place de la Gare' => 1, 'Place De La Gare - Nevers' => 0, 'Place De La Gare - Grenoble' => 0, 'Place De La Gare 1 - Grenoble' => 0, 'Place De La Gare - Angers' => 0, 'Place de la Gare - Libourne' => 1, 'Place de la Gare - Moutiers' => 1, 'Place Mohammed V - Oujda' => 0, 'Place De La Gare' => 0, 'Place Du Chateau - Galerie Marchande Du Rer' => 0, 'Place de la Gare - Quimper' => 1, 'Place De La Gare - Nevers' => 0, 'Place De La Gare - Rennes' => 1 };

    I would have though that the Rennes match would have a score of 2, it looks to me that its case sensitive (as re-running with the input as 'Place De La Gare' instead of 'Place de la Gare' scores as i would expect. I dont however get why this is seeing as the regex is using -i ?