You might be pleased to know that MySQL does support the creation of full text indexes. Check out Chapter 25.2 at MySQL.com for details.

When it comes to parsing your user input forms you do have a few options for syntaxes. You might want to check out some of the text processing modules such as Text::Balanced to reliably extract your quoted strings. However, you might be disappointed to discover that the FTR (Full Text Retrieval) functions of MySQL do not support phrase searching.

I am a fan of a couple of search criteria synaxes:
- "alta vista" - the + and - of keywords syntax
- what I call "simplified boolean" where the users can put AND's and OR's in their searches without needing to do lots of quoting. - you choose your syntax with what you think your users are going to most easily be able to use.

Once you have decided how you want your users to express their queries and how you wanna bust up your searches and you feel that the FTR searches in MySQL are sufficient you could translate user terms into SELECTs.

Keep in mind that it is easy to think "arg, MySQL FTR is not sophisticated enough for my searches". Don't forget You can still do ANDs, ORs and NOTs by stringing successive clauses such as:

create table documents ( docid int not null auto_increment primary key, title varchar(255) null, doctext text, fulltext index (text) ); select title,doctext from documents where match(doctext) against ('perl') or match(doctext) against ('monger') and not match(doctext) against ('java');
Your results will come back weighted by a common relevancy ranking algorithm (vector space) and truncated at 50% "threshold". There are few compile-time tunings you can make other than this threshold, the minimum size of index term, and stopwords.

A good example of current IR technology development is Managing Gigabytes. These guys are pretty damn smart.

I hope that is at least a bit of lead for building your system.

Cheers, Jay


In reply to Re: Perl, MySQL, and Full Text Searches by jlawrenc
in thread Perl and mySQL Searches by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.