Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

According to the docs for Lingua::Stem:
Lingua::Stem is a ’pure perl’ ... suite.
Lingua::Stem::Snowball is XS based
However, I seem to be having difficulty getting Stem to produce the same output as Snowball. Here is my test code:
use strict; use warnings; use Data::Dumper; use Lingua::Stem; use Lingua::Stem::Snowball; chomp( my @names = <DATA> ); my $stemmer = Lingua::Stem->new( -locale => 'EN' ); $stemmer->stem_caching({ -level => 2 }); my $stems = $stemmer->stem(@names); print "Stem => ", Dumper $stems; @names = map lc, @names; $stemmer = Lingua::Stem::Snowball->new( lang => 'en' ); $stemmer->stem_in_place( \@names ); print "Snowball => ", Dumper \@names; __DATA__ John Smith Plumbing J Smith's Plumbing J Smith's Plumbing Jerry Spaulding Goldsmith J Spaulding's Gold
And here is my output (Perl 5.10 on Mac OS)
Stem => $VAR1 = [
          'john smith plumbing',
          'j smith\'s plumbing',
          'j smith\'s plumbing',
          'jerry spaulding goldsmith',
          'j spaulding\'s gold'
        ];

Snowball => $VAR1 = [
          'john smith plumb',
          'j smith\'s plumb',
          'j smith\'s plumb',
          'jerry spaulding goldsmith',
          'j spaulding\'s gold'
        ];
I would like for the first array to look like the second -- that is, why does Stem not stem the word plumbing into plub like Snowball does. I do have one hint -- when I removed the language from the Snowball constructor, it too did not stem the word plumbing into plumb. Perhaps I am missing something very trivial in my Stem constructor? (Additionally, I am looking to use Stem solely for its Perl pure solution.)

Thanks in advance for any and all suggestions, solutions or sarcasms. :)

Replies are listed 'Best First'.
Re: Lingua::Stem vs Lingua::Stem::Snowball
by Anonymous Monk on Aug 11, 2010 at 23:48 UTC
    If Cartman were here, he would say "Screw you guys ... I figured it out by myself." ;)

    For some reason with Stem, the word plumbing will be stemmed, but unlike Snowball, Stem will not stem a string of words, such as "John Smith's Plumbing." In other words, one has to first split on whitespace.

    No worries, but this site does seem to be losing the edge it once had. :(

      Funny
      No worries, but this site does seem to be losing the edge it once had. :(

      Well, some of us have jobs.

      Edgy enough for you?