in reply to Speed up the search

wow. you could write a book (and some have) on search technology. IMHO, there are no good perl search modules. (Please someone prove me wrong). Check out searchtools for a pretty comprehensive list of available apps/libraries. A lot of the products there cost ($$) and a lot of them focus on the spidering of information vice the indexing/searching but you should find it a good starting point. Just be prepared to spend a significant chunk of time integrating.

The main problem with your approach is it will not scale well. It may work fine for your current doc set but add a few more thousand and it will become unbearably slow. Also doing all that regex work in realtime will become burdensome. Most approach this problem by indexing offline and then using those indexes for searching. You run the risk of stale searches if you have extremely dynamic docs but most people don't - so indexing on a aperiodic basis (weekly) will do the trick.

An example of a perl library found at searchtools would be perlfect.

-derby

update: Thanks perrin. I'll look into Search::InvertedIndex. I've looked at DBIx::FullTextSearch before but didn't want the MySql overhead.

update again: Just to clarify, I would really like to see soemthing like lucene in perl world.

update yet again: perrin is right. I need to look at CPAN more closely. Besides the two mentiond below, WAIT is a perl/XS implementation of the once ubiquitous WAIS.

Replies are listed 'Best First'.
Re: Re: Speed up the search
by perrin (Chancellor) on May 07, 2002 at 13:25 UTC
Re: Re: Speed up the search
by Anonymous Monk on Jan 13, 2004 at 15:42 UTC
    We have just written an indexing search tool. contact r.talbot@staff.covcollege.ac.uk for more info.