in reply to Removing Stopwords from a String

Wow! That's a lot of regular expressions going on there...

Personally I'd do something more akin to:

my @stopwords = qw/ i'd add all my stop words in here /; my %stop = map { lc $_ => 1 } @stopwords; sub findwords { my $string = shift; my (@ok, %seen); while ($string =~ /((\w|')+)/g) { push @ok, $1 unless $stop{lc $1} or $seen{lc $1}++; } return @ok; }}

My tests show this as coming out about 2 orders of magnitude faster, and it also copes better with apostrophized words that aren't in the stop list.

Tony

Replies are listed 'Best First'.
Re^2: Removing Stopwords from a String
by Anonymous Monk on Sep 19, 2008 at 10:33 UTC
    Hey How to use It?? I mean if i have an array containing whole string and i want to remove these stopwords from it then how i would use this subroutine.. Sorry I am new to PErl plz reply