in reply to Fulltext DB search: The Need for Speed
But I can speculate. My guess is that Google has a huge index. An index on words. If it fetches a (new) web page, it makes a list of all the words occurring in the page. For each word, it stores a pointer to said page in the index of words it keeps (including pointer(s) where in the document the word(s) are found). So, if you search for "the brown cat with a glass eye", it will toss out the common words 'the' and 'a' (and perhaps 'with' as well). For 'brown', 'cat', 'glass', and 'eye' (and perhaps 'with'), it gets the pointers to the pages the words are found in. For pages containing all words, you need to take the intersection of the different (sub)results.
Of course, in reality Google will do it much smarter, perhaps not just indexing on single words, but on word pairs or triples, or by using a multilevel index.
But the important point is, if you want to search on words, and you want to search fast, you got to index on words, and not use full text searches. And even if you want to only returns documents that have "the brown cat with a glass eye" right next to each other, it's a huge win if you limit your full text search to those documents that contain the words 'brown', 'cat', 'glass' and 'eye'.
Abigail
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Fulltext DB search: The Need for Speed
by jest (Pilgrim) on Oct 27, 2003 at 16:08 UTC |