wow. you could write a book (and some have) on search technology. IMHO, there are no good perl search modules. (Please someone prove me wrong). Check out searchtools for a pretty comprehensive list of available apps/libraries. A lot of the products there cost ($$) and a lot of them focus on the spidering of information vice the indexing/searching but you should find it a good starting point. Just be prepared to spend a significant chunk of time integrating.

The main problem with your approach is it will not scale well. It may work fine for your current doc set but add a few more thousand and it will become unbearably slow. Also doing all that regex work in realtime will become burdensome. Most approach this problem by indexing offline and then using those indexes for searching. You run the risk of stale searches if you have extremely dynamic docs but most people don't - so indexing on a aperiodic basis (weekly) will do the trick.

An example of a perl library found at searchtools would be perlfect.

-derby

update: Thanks perrin. I'll look into Search::InvertedIndex. I've looked at DBIx::FullTextSearch before but didn't want the MySql overhead.

update again: Just to clarify, I would really like to see soemthing like lucene in perl world.

update yet again: perrin is right. I need to look at CPAN more closely. Besides the two mentiond below, WAIT is a perl/XS implementation of the once ubiquitous WAIS.


In reply to Re: Speed up the search by derby
in thread Speed up the search by elbow

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.