The reasons you have said that using an inverted index isn't practical is that
a) you need to support searching for phrases
Matching phrases against an index is a case of splitting the phrase into its constituant words, and then intersecting the sets of record numbers that are returned from the index. (see Re: Idea for XPath implementation for slightly better explaination of this).
And, or & not are just extensions of the set manipulations.
b) you need to support partial matches.
Partial matches are a bit more complex, but davorgs Tie::Hash::Regex as the basic for your inverted index,
or use grep /partial.*., keys %index; (which what is used under the covers).
This would probably involve using doing some manipulation of the input query to convert partial matches to regex notation (eg. bio* => bio[^\s]*), unless your users are comfortable using regex notation.
Just a thought in case you haven't already considered this.
In reply to Re: Re: Re: Re: Re: speeding up a file-based text search
by BrowserUk
in thread speeding up a file-based text search
by perrin
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |