in reply to Search a database for every permutation of a list of words
Note that there are various trade-offs of memory, disk space, and search time that you can make. You'll need to analyze carefully what a "typical" query is like, and what a "bad" query is like, and take appropriate measures.
Here's A Way To Do It. I assume you have (or will manufacture) a list of all search terms.
my %hit; foreach my $t (@term) { my @doc = $documents_mentioning{$t}; $hit{$_}++ for (@doc); } # Filter (keys %hits) by # of hits (at least 2), then sort. my @res = sort { $hit{$b} <=> $hit{$a} } (grep {$hit{$_} >= 2} keys %hit);
"Interesting" parts include dealing with very large indexes (if your collection of documents is large), stemming words, and selecting the relevant words.
A good alternative might be to get a 3rd-party search engine instead of writing your own.
|
---|