DougMcq has asked for the wisdom of the Perl Monks concerning the following question:

Seeking wisdom I have a fairly typical situation. I have a number of tagged articles. I would like to implement user entered boolean search functionality. BUT also need advanced features, ie. ignore noise/stop words, support stemming, synonyms, and (minor) misspellings, etc. Is there a perl library to do this? I attempted to use search tools for search tools but not much luck...

Replies are listed 'Best First'.
Re: Caught in a Search Recursion ;-)
by Corion (Patriarch) on Mar 12, 2015 at 13:03 UTC
      elastic search is just what the doctor ordered. BTW - Amazon is easing integration with AWS next month. THANKS!
Re: Caught in a Search Recursion ;-)
by Your Mother (Archbishop) on Mar 12, 2015 at 14:57 UTC

    I would recommend Lucy. It comes with a lot of what you want out of the box and has the speed to pile on all the correlated queries you want if you build the data for it to use. For misspellings you’ll need to either apply Text::Aspell (tunable misspelling) or Text::DoubleMetaphone (major misspelling) or similar. Synonyms… not sure, I’ve never done it. If you have a specific domain, there might be prior art, like the UMLS::Similarity stuff for medical knowledge. Not sure about generic thesaurus stuff. Let us know if you figure that piece out.

    This stuff is typical but not easy. When you put so much in the mix then ostensibly simple things like sorting can get difficult. This kind of search also works best with either static or slowing slowly changing search bodies. Search here is fast at the expense of indexing the data up front.

    Update: recent example—Re^3: Using indexing for faster lookup in large file.

Re: Caught in a Search Recursion ;-)
by stonecolddevin (Parson) on Mar 12, 2015 at 16:34 UTC

    Depending on what you're using for a data store, postgres has some really kick ass full text search features built right in.

    Three thousand years of beautiful tradition, from Moses to Sandy Koufax, you're god damn right I'm living in the fucking past