in reply to Perl and mySQL Searches
When it comes to parsing your user input forms you do have a few options for syntaxes. You might want to check out some of the text processing modules such as Text::Balanced to reliably extract your quoted strings. However, you might be disappointed to discover that the FTR (Full Text Retrieval) functions of MySQL do not support phrase searching.
I am a fan of a couple of search criteria synaxes:
- "alta vista" - the + and - of keywords syntax
- what I call "simplified boolean" where the users can
put AND's and OR's in their searches without needing to
do lots of quoting.
- you choose your syntax with what you think your
users are going to most easily be able to use.
Once you have decided how you want your users to express their queries and how you wanna bust up your searches and you feel that the FTR searches in MySQL are sufficient you could translate user terms into SELECTs.
Keep in mind that it is easy to think "arg, MySQL FTR is
not sophisticated enough for my searches". Don't forget
You can still
do ANDs, ORs and NOTs by stringing successive clauses
such as:
Your results will come back weighted by a common relevancy ranking algorithm (vector space) and truncated at 50% "threshold". There are few compile-time tunings you can make other than this threshold, the minimum size of index term, and stopwords.create table documents ( docid int not null auto_increment primary key, title varchar(255) null, doctext text, fulltext index (text) ); select title,doctext from documents where match(doctext) against ('perl') or match(doctext) against ('monger') and not match(doctext) against ('java');
A good example of current IR technology development is Managing Gigabytes. These guys are pretty damn smart.
I hope that is at least a bit of lead for building your system.
Cheers, Jay
|
|---|