in reply to Re^2: Hash versus substitutation efficiency?
in thread Hash versus substitutation efficiency?
I think what you'd want to do is, given a synonym hash, make a reverse mapping that keys the unique values to a regex that alternates among all the keys.
So you run your indexing scheme using the 10,000 items in %revsynhash, the keys being the indexed term, while the values are the regexes to count as hits for those terms. When someone wants to look up a term, you use %synhash to translate it, and hit your index for what it translated to.my %synhash = ( meat => 'meat', ham => 'meat', beef => 'meat'); my %revsynhash; # Collect the synonyms as an array while (my ($k,$v) = each %synhash) { push @{$revsynhash{$v}}, $k; } # Turn the arrays into alternative-lists, longest first while (my $k = each %revsynhash) { $revsynhash{$k} = join '|', sort {length $b <=> length $a} @{$revsyn +hash{$k}}; } # Now you have # %revsynhash = ( meat => 'beef|meat|ham')
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^4: Hash versus substitutation efficiency?
by bwelch (Curate) on Oct 12, 2004 at 16:05 UTC |