in reply to Re^3: Problem getting Russian stopwords
in thread Problem getting Russian stopwords
map decode("KOI8-R", $_), keys %$stopwords;The problem is that your stopwords are left undecoded in the hash. You should produce a new hash containing transformed keys instead of throwing the results of decode out:
Also, the stop words are in lower case, which means that you should lowercase your text too before checking whether it's a stopword or not.my %stopwords; undef @stopwords{ map decode("KOI8-R", $_), keys %{getStopWords('ru')} };
You may want to split your text on /\W+/ to get the words in one operation.say join ' ', grep { ! exists $stopwords{lc $_} } @words;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Problem getting Russian stopwords
by Your Mother (Archbishop) on Sep 19, 2018 at 08:08 UTC | |
by Anonymous Monk on Sep 19, 2018 at 20:02 UTC | |
by Your Mother (Archbishop) on Sep 20, 2018 at 07:47 UTC |