Let's suppose you have your list of stop words in a plain text file -- this would be handy, in case you decide you want to lengthen or shorten the list now and then, because you won't need to modify your script if it's done something like this (updated to add a bit more commentary):
open( LIST, "mystopwords.txt" ) or die "$!"; my @stopwords = <LIST>; # assuming one stop word per line close LIST; chomp @stopwords; my $stopregex = join '|', @stopwords; # ... now, when you go to delete stopwords from $_, # it goes like this: s/\b(?:$stopregex)\b//g;
I presume you are involved in some process that removes punctuation as well. If you're not, then removal of just the stopwords will leave behind some odd patterns (e.g. if input includes things like "about-face", "morning-after pill", "man-about-town", and so on).

(You didn't mention whether you were using the "g" modifier when removing the stop words. Could that have been your problem?)

Another update: the regex approach works fine and might even be optimal, but there's another way, of course:

my %stopwd; open( LIST, "my_stopwords.txt" ) or die "$!"; while (<LIST>) { chomp; $stopwd{$_} = undef; # assume one word per line } close LIST; # now, to remove stopwords, split the input data ($_) on \b # and check each token: my $filtered = join '', map { exists($stopwd{$_}) ? '':$_ } split /\b/ +;

In reply to Re: removing stop words by graff
in thread removing stop words by zulqernain

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.