You might be pleased to know that MySQL does support the creation of full text indexes. Check out
Chapter 25.2 at
MySQL.com for details.
When it comes to parsing your user input forms you do have
a few options for syntaxes. You might want to check out some
of the text processing modules such as Text::Balanced
to reliably extract your quoted strings. However, you
might be disappointed to discover that the FTR
(Full Text Retrieval) functions
of MySQL do not support phrase searching.
I am a fan of a couple of search criteria synaxes:
- "alta vista" - the + and - of keywords syntax
- what I call "simplified boolean" where the users can
put AND's and OR's in their searches without needing to
do lots of quoting.
- you choose your syntax with what you think your
users are going to most easily be able to use.
Once you have decided how you want your users to express
their queries and how you wanna bust up your searches
and you feel that the FTR searches in MySQL are sufficient
you could translate user terms into SELECTs.
Keep in mind that it is easy to think "arg, MySQL FTR is
not sophisticated enough for my searches". Don't forget
You can still
do ANDs, ORs and NOTs by stringing successive clauses
such as:
create table documents (
docid int not null auto_increment primary key,
title varchar(255) null,
doctext text,
fulltext index (text) );
select title,doctext from documents where
match(doctext) against ('perl')
or match(doctext) against ('monger')
and not match(doctext) against ('java');
Your results will come back weighted by a common relevancy
ranking algorithm (vector space) and truncated at 50%
"threshold". There are few compile-time tunings you can make
other than this threshold, the minimum size of index
term, and stopwords.
A good example of current IR technology development is
Managing Gigabytes.
These guys are pretty damn smart.
I hope that is at least a bit of lead for building your
system.
Cheers,
Jay |