Lingua::Stem is a ’pure perl’ ... suite. Lingua::Stem::Snowball is XS basedHowever, I seem to be having difficulty getting Stem to produce the same output as Snowball. Here is my test code:
And here is my output (Perl 5.10 on Mac OS)use strict; use warnings; use Data::Dumper; use Lingua::Stem; use Lingua::Stem::Snowball; chomp( my @names = <DATA> ); my $stemmer = Lingua::Stem->new( -locale => 'EN' ); $stemmer->stem_caching({ -level => 2 }); my $stems = $stemmer->stem(@names); print "Stem => ", Dumper $stems; @names = map lc, @names; $stemmer = Lingua::Stem::Snowball->new( lang => 'en' ); $stemmer->stem_in_place( \@names ); print "Snowball => ", Dumper \@names; __DATA__ John Smith Plumbing J Smith's Plumbing J Smith's Plumbing Jerry Spaulding Goldsmith J Spaulding's Gold
Stem => $VAR1 = [
'john smith plumbing',
'j smith\'s plumbing',
'j smith\'s plumbing',
'jerry spaulding goldsmith',
'j spaulding\'s gold'
];
Snowball => $VAR1 = [
'john smith plumb',
'j smith\'s plumb',
'j smith\'s plumb',
'jerry spaulding goldsmith',
'j spaulding\'s gold'
];
I would like for the first array to look like the second -- that is, why does Stem not stem the word plumbing into plub like Snowball does. I do have one hint -- when I removed the language from the Snowball constructor, it too did not stem the word plumbing into plumb. Perhaps I am missing something very trivial in my Stem constructor? (Additionally, I am looking to use Stem solely for its Perl pure solution.)
Thanks in advance for any and all suggestions, solutions or sarcasms. :)
In reply to Lingua::Stem vs Lingua::Stem::Snowball by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |