Ctb has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow Monks,

I was redirected here by a veteran perl slinger on another forum after we bounced this question around for awhile:

"Is there a simple module already available to specifically parse incoming query strings?"

We looked around the CPAN and bounced a few modules around that were for string parsing and the like, but nothing quite fit what I'm going for. The idea is simple: you instantiate your object, call a prepare(RULES) method to set RULES, then call a parse_string($query) method. You grab the hash reference return value from parse_string in a scalar and off you go with a list of words that go with AND and NOT booleans, quoted phrases, etc. The rules that you pass tell the module what type of search you're using (i.e. does it default to OR or AND between words), what your wildcards are, how to treat certain errors, etc., etc. It's database and system independant (so far), so that's a plus.

I'm about 30% done with the module, and I want to know if this description rings a bell with anyone about anything on the CPAN. If there's already a very similar module (or one that could easily be made similar), I won't bother releasing this one on the CPAN. Anyone who recognizes such a module, please chime in so I don't go wasting anybody's extra space or bandwidth!

Thank you, Chris

Replies are listed 'Best First'.
Re: Query String Parsing Module
by FamousLongAgo (Friar) on Oct 29, 2002 at 04:27 UTC
    I've been working on latent semantic search engines in Perl, and have a similar module on hand, although it won't be ready for CPAN for a while. Based on my experience, here are some questions you may wish to consider as you design your code:
    • Do you want to support exact phrase matching? If so, what constitutes a phrase, and how is it parsed out? Do the elements of a phrase match also count as keywords?
    • Are you assuming the search terms will always be in English? If so, you may consider using a stemmer like Lingua::EN::Stemmer, to improve recall
    • Do you want to ignore case in the query, or use it for clues about which words are wanted? For example, do you want to recognize proper names and treat them differently based on capitalization? Acronyms ( 'AIDS' vs. 'aids')
    • Do you want to consider word order as important? This could make a difference in collocations, ( 'hot dog' vs. 'dog hot')
    CPAN module or not, there is quite a bit of existing code in the field, so I would encourage you to look around ( as you are doing! ) before you do too much coding. A good reference is Foundations of Statistical Natural Language Processing by Manning and Shütze. I'm happy to share my own code if you like, as well. Good luck!
Re: Query String Parsing Module
by perrin (Chancellor) on Oct 29, 2002 at 05:22 UTC