in reply to Lingua::Stem exceptions

The documentation suggests there should be a default list of exceptions

Are you sure about that? Correct me if I'm wrong but to me it looks like you have to supply the list yourself. I did a quick scan on the Lingua::Stem code and I can't find a default list of exceptions except an {}. From the documentation:

my $stems = Lingua::Stem::En::stem({ -words => $word_list_referenc +e, -locale => 'en', -exceptions => $exceptions_hash, });

Replies are listed 'Best First'.
Re^2: Lingua::Stem exceptions
by AndrewMB (Novice) on Jan 29, 2009 at 00:43 UTC
    The documentation for get_exceptions says "As a class method with no parameters it returns all the default exceptions as an anonymous hash of 'exception' => 'replace with' pairs" - which seems to suggest there might be some! But I also searched the code and couldn't find any. It isn't easy to invent a list of words which the stem algorithm stems incorrectly (such as this stemming to thi) so I hoped someone might have done the work to come up with a list of common words. Otherwise the only way I can think of doing it is to stem a large quantity of text and examine the results - rather laborious even if sorted by frequency.