Re: finding a set of relevant keys

ericleasemorgan,
You're data structure doesn't allow short circuiting. You could do a 1 time pass of %words and create a parallel data structure that would make subsequent searches a hash lookup. Something like this:

my %stems;
for my $word (keys %words) {
    my $stem = $stemmer->stem($word);
    $stems{$stem} += $words{$word};
}
for my $idea (@ideas) {
    my $stem = $stemmer->$stem($idea);
    my $val = $stems{$stem} || 0;
    print "$idea\t$val\n";
}
[download]

Of course, if %words is constantly changing then so must %stems. There are ways to do this (tied hashes for instance) or just a better datastructure (tree versus hash). As far as better stemming library - the library you are using does support exceptions. When you say you are going to run the program 10K times, it makes me wonder if you mean each time it is only going to get the value of a single word. That would obviously be inefficient. It would be better to serialize (freeze/thaw) your data structure so you don't pay the runtime performance penalty of converting the hash.

Cheers - L~R

Comment on Re: finding a set of relevant keys Download Code

Replies are listed 'Best First'.
Re^2: finding a set of relevant keys by ericleasemorgan (Initiate) on Oct 13, 2009 at 16:04 UTC
Creating a "parallel" data structure in the manner you describe is what others have suggested as well. Thank you!	[reply]
Re^3: finding a set of relevant keys by Limbic~Region (Chancellor) on Oct 13, 2009 at 16:10 UTC
ericleasemorgan, To be honest, changing %words to a more appropriate data structure would probably be best. This way you could walk the tree. Unfortunately, without knowing how %words is used by the rest of the program or how it is maintained, that is a blind recommendation. Cheers - L~R	[reply]