httptech has asked for the wisdom of the Perl Monks concerning the following question:
Then you can search the index file for each keyword and include/exclude documents based on boolean constructs. It's very fast, but you are limited to just using boolean-style constructs, like foo AND bar AND baz. There's no way to tell where foo, bar and baz are at in the document.foo: index.html,page1.html bar: page1.html baz: page1.html,page2.html
Or, you can loop through all the files each time, using perl and regexs to find your search terms, what I call "recurse-and-grep". It's slow and eats up CPU and HD time, but you can search for phrases, like foo bar baz.
My problem is; I want the speed of an indexed search, but I also want to be able to search for phrases, not just keywords. The big name search engines can do this, but all the perl/CGI search scripts I have found to date cannot do both.
I considered doing something along the lines of using an indexed search to narrow my query down to just the documents that contain all the words of the phrase in any order, then grepping those documents looking for the phrase, but this will have widely varying speed based on how many documents are returned. In the case that every document matched the individual search terms it would actually be slower than just using the recurse-and-grep method alone.
So, what's the secret?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Search Engine Theory
by lhoward (Vicar) on Jun 06, 2000 at 05:12 UTC | |
by Anonymous Monk on Jun 06, 2000 at 17:55 UTC | |
by lhoward (Vicar) on Jun 06, 2000 at 18:01 UTC | |
|
Re: Search Engine Theory
by nardo (Friar) on Jun 06, 2000 at 13:20 UTC | |
by httptech (Chaplain) on Jun 06, 2000 at 15:56 UTC | |
by nardo (Friar) on Jun 06, 2000 at 20:16 UTC | |
|
RE: Search Engine Theory
by JanneVee (Friar) on Jun 06, 2000 at 18:13 UTC | |
|
RE: Search Engine Theory
by turnstep (Parson) on Jun 07, 2000 at 01:38 UTC |