in reply to Re: Group Similar Items
in thread Group Similar Items
Other canonical forms might be obtained from Text::Soundex or Text::Metaphone where "similar" means "sound alike".
How to then transform this hash of word affinities into a single list with no repeats is left as an exercize for the reader. :-)
Update:
To flesh out the soundex idea a little, here's a short example:
It prints:use strict; use Text::Soundex; my @inwords = qw(holly perl monks yahoo monk holey google eperl holy g +oxgle kugel april); my (%hash, @outwords); push @{$hash{soundex($_)}}, $_ foreach @inwords; push @outwords, @{$hash{$_}} foreach keys %hash; print join(' ', @outwords);
Since each word in this example has but one canonical form, it appears in the hash exactly once. So there are no repeats to untangle as with the cat/hat illustration.google goxgle kugel perl holly holey holy yahoo april monks monk eperl
|
---|