in reply to Removing Stopwords from a String
Wow! That's a lot of regular expressions going on there...
Personally I'd do something more akin to:
my @stopwords = qw/ i'd add all my stop words in here /; my %stop = map { lc $_ => 1 } @stopwords; sub findwords { my $string = shift; my (@ok, %seen); while ($string =~ /((\w|')+)/g) { push @ok, $1 unless $stop{lc $1} or $seen{lc $1}++; } return @ok; }}
My tests show this as coming out about 2 orders of magnitude faster, and it also copes better with apostrophized words that aren't in the stop list.
Tony
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Removing Stopwords from a String
by Anonymous Monk on Sep 19, 2008 at 10:33 UTC |