zulqernain has asked for the wisdom of the Perl Monks concerning the following question:

in @data i have lines of text . @stopwords has a list of stopwords. thsi is the code i wrote. i am converting the sentences to the array of words i.e. @arr and after that i am trying to remove the stopwords
my (@arr,@a); foreach $word (@data) { push @arr,split(/ /,$word); } WORD: foreach $a (@arr) { foreach $stop (@stopwords) { next WORD if $a eq $stop; } push(@lessWords, $a); } print "@lessWords\n";
its not removing stopwords plz help

Replies are listed 'Best First'.
Re: removing stopwords
by Transient (Hermit) on Jun 01, 2005 at 22:14 UTC
    #!/usr/bin/perl -w use strict; my @data = ( "Some lines of text", "I love my lines of text", "These a +re nice lines" ); my @stopwords = ( "lines" ); my @lessWords = (); my (@arr,@a); foreach my $word (@data) { push @arr,split(/ /,$word); } WORD: foreach my $a (@arr) { foreach my $stop (@stopwords) { next WORD if $a eq $stop; } push(@lessWords, $a); } print join( ' ', @lessWords ), "\n"; __OUTPUT__ Some of text I love my of text These are nice
      i used the same thing in my program and it does not remove the stopwords but when i run this origram it works fine. i have rechceked the contents of the arrays every thing is fine...i dont understand where is the probelm

        Then show us a piece of code that displays the problem, including sample data that causes the problem. So far, you haven't shown us code that works by itself, and you haven't given sample data.

        We've showed you how to do it. That means the problem is with your implementation. If we don't see it, we can't help you.

Re: removing stopwords
by eric256 (Parson) on Jun 01, 2005 at 22:18 UTC

    It would seem you are going about this the hard way. Instead of breaking the sentences down into words you could combine the stopwords and make a regex.

    Then you just use s/// to replace occurences of the stop words with nothing or a marker of some sort.

    use strict; use warnings; my $text = "Hello world how are you doing?"; my @stopwords = ("hello","how"); my $regex = join('\b|\b', @stopwords); $text =~ s/$regex/*BAD*/igs; print $text;

    ___________
    Eric Hodges
      i tried it but it makes the program very slow becuse the text file size very big and teh number of stopword are about 350