AndrewMB has asked for the wisdom of the Perl Monks concerning the following question:

I have installed Lingua::Stem (from the Activestate Package Repository). The documentation suggests there should be a default list of exceptions, but my installed package doesn't seem to have any (nothing is returned by get_exceptions) and (for example) 'this' is stemmed to 'thi'. Am I missing anything in my install, or alternatively is there a standard list of English exceptions for this module which I can add using add_exceptions?

Replies are listed 'Best First'.
Re: Lingua::Stem exceptions
by dHarry (Abbot) on Jan 28, 2009 at 16:33 UTC
    The documentation suggests there should be a default list of exceptions

    Are you sure about that? Correct me if I'm wrong but to me it looks like you have to supply the list yourself. I did a quick scan on the Lingua::Stem code and I can't find a default list of exceptions except an {}. From the documentation:

    my $stems = Lingua::Stem::En::stem({ -words => $word_list_referenc +e, -locale => 'en', -exceptions => $exceptions_hash, });
      The documentation for get_exceptions says "As a class method with no parameters it returns all the default exceptions as an anonymous hash of 'exception' => 'replace with' pairs" - which seems to suggest there might be some! But I also searched the code and couldn't find any. It isn't easy to invent a list of words which the stem algorithm stems incorrectly (such as this stemming to thi) so I hoped someone might have done the work to come up with a list of common words. Otherwise the only way I can think of doing it is to stem a large quantity of text and examine the results - rather laborious even if sorted by frequency.
Re: Lingua::Stem exceptions
by Anonymous Monk on Jan 28, 2009 at 16:45 UTC
    There are no default exceptions, you can add some like
    # adding default exceptions Lingua::Stem::add_exceptions({ 'emily' => 'emily', 'driven' => 'driven', });