Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm working with mySQL & CGI(.pm) and need to make a "search engine" for the database. Are there modules that help handle this sort of thing? If not, how would one go about starting to parse a command like
"chocolate icecream" milk sugar
or what have you, pass a command through DBI, and have ranked display based on search relativity? Is this truly the beast it seems to be, or is there an easier way around this?

Thanks

Replies are listed 'Best First'.
Re: Perl, MySQL, and Full Text Searches
by jlawrenc (Scribe) on Mar 18, 2001 at 03:26 UTC
    You might be pleased to know that MySQL does support the creation of full text indexes. Check out Chapter 25.2 at MySQL.com for details.

    When it comes to parsing your user input forms you do have a few options for syntaxes. You might want to check out some of the text processing modules such as Text::Balanced to reliably extract your quoted strings. However, you might be disappointed to discover that the FTR (Full Text Retrieval) functions of MySQL do not support phrase searching.

    I am a fan of a couple of search criteria synaxes:
    - "alta vista" - the + and - of keywords syntax
    - what I call "simplified boolean" where the users can put AND's and OR's in their searches without needing to do lots of quoting. - you choose your syntax with what you think your users are going to most easily be able to use.

    Once you have decided how you want your users to express their queries and how you wanna bust up your searches and you feel that the FTR searches in MySQL are sufficient you could translate user terms into SELECTs.

    Keep in mind that it is easy to think "arg, MySQL FTR is not sophisticated enough for my searches". Don't forget You can still do ANDs, ORs and NOTs by stringing successive clauses such as:

    create table documents ( docid int not null auto_increment primary key, title varchar(255) null, doctext text, fulltext index (text) ); select title,doctext from documents where match(doctext) against ('perl') or match(doctext) against ('monger') and not match(doctext) against ('java');
    Your results will come back weighted by a common relevancy ranking algorithm (vector space) and truncated at 50% "threshold". There are few compile-time tunings you can make other than this threshold, the minimum size of index term, and stopwords.

    A good example of current IR technology development is Managing Gigabytes. These guys are pretty damn smart.

    I hope that is at least a bit of lead for building your system.

    Cheers, Jay

Re: Perl and mySQL Searches
by lhoward (Vicar) on Mar 18, 2001 at 00:17 UTC
Re: Perl and mySQL Searches
by arhuman (Vicar) on Mar 18, 2001 at 02:04 UTC
    If you're talking about a fulltext search, DBIx::FullTextSearch may be your answer (it uses mysql)...

    I don't know if I should mention it but, for really simple (and dirty) search on several text field in your table,LIKE and CONCAT in your sql statement would do the trick (but so slowly...)

    "Trying to be a SMART lamer" (thanx to Merlyn ;-)
Re: Perl and mySQL Searches
by lachoy (Parson) on Mar 18, 2001 at 22:20 UTC

    If you decide to roll your own the Lingua::Stem module is pretty darned useful. It implements a common stemming algorithm so that searches for 'search' will find 'searching', 'searched', 'searchable', etc., without resorting to LIKE queries. (This is probably done internally by most full-text search database plugins, but it can be useful to manipulate yourself if you want to do anything nonstandard.

    Chris
    M-x auto-bs-mode