Then you can search the index file for each keyword and include/exclude documents based on boolean constructs. It's very fast, but you are limited to just using boolean-style constructs, like foo AND bar AND baz. There's no way to tell where foo, bar and baz are at in the document.foo: index.html,page1.html bar: page1.html baz: page1.html,page2.html
Or, you can loop through all the files each time, using perl and regexs to find your search terms, what I call "recurse-and-grep". It's slow and eats up CPU and HD time, but you can search for phrases, like foo bar baz.
My problem is; I want the speed of an indexed search, but I also want to be able to search for phrases, not just keywords. The big name search engines can do this, but all the perl/CGI search scripts I have found to date cannot do both.
I considered doing something along the lines of using an indexed search to narrow my query down to just the documents that contain all the words of the phrase in any order, then grepping those documents looking for the phrase, but this will have widely varying speed based on how many documents are returned. In the case that every document matched the individual search terms it would actually be slower than just using the recurse-and-grep method alone.
So, what's the secret?
In reply to Search Engine Theory by httptech
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |