hsmyers has asked for the wisdom of the Perl Monks concerning the following question:
For reasons that are beyond me, Amazon's web API sees nothing wrong with returing 'Vic Broquard', 'Broquard Vic' and 'Victor E. Broquard' all as authors for the same book! Needless to say this is at best a pain but for the moment, a fact of life. So I was thinking about how to solve this small quandry and I've come up with the following as a kind of heuristic approach.
Where the idea is to do something different about those instances where the check value is say greater than .5---in this case assume that the names in the array are variations on each other and pick the longest one as the best guess. Or select an alternate lookup or some other such approach. Anyone with suggestions,improvements etc., chime in here please...thanks!#!/perl/bin/perl # # test.pl -- use strict; use warnings; use diagnostics; use String::Similarity; my @authors1 = ( 'Vic Broquard', 'Broquard Vic', 'Victor E. Broquard', ); my @authors2 = ( 'Peter Prinz', 'Ulla Kirch-Prinz', ); my @authors3 = ( 'Larry Wall', 'Tom Christiansen', 'Jon Orwant', ); print "Average Similarity for \@authors1 = ",check_similarity(@authors +1),"\n"; print "Average Similarity for \@authors2 = ",check_similarity(@authors +2),"\n"; print "Average Similarity for \@authors3 = ",check_similarity(@authors +3),"\n"; sub check_similarity { my ($count,$similarity_total); for my $ref (combinations(@_)) { if (scalar(@$ref) == 2) { $count++; $similarity_total += similarity (@{$ref}[0],@{$ref}[1]); } } return $similarity_total / $count; } sub combinations { return [] unless @_; my $first = shift; my @rest = combinations(@_); return @rest, map { [$first, @$_] } @rest; } C:>test Average Similarity for @authors1 = 0.666666666666667 Average Similarity for @authors2 = 0.444444444444444 Average Similarity for @authors3 = 0.246153846153846
--hsm
"Never try to teach a pig to sing...it wastes your time and it annoys the pig."
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Perls before Amazon
by benn (Vicar) on Jun 08, 2003 at 19:15 UTC | |
by hsmyers (Canon) on Jun 08, 2003 at 22:33 UTC | |
|
Re: Perls before Amazon
by LAI (Hermit) on Jun 09, 2003 at 19:09 UTC |