in reply to Re^4: Problem getting Russian stopwords
in thread Problem getting Russian stopwords

undef @stopwords{ map decode("KOI8-R", $_), keys %{getStopWords("ru")} + };

That’s a really interesting hash slice trick. I like it. Haven’t learned a new Perl idiom in a long time. Thanks.

I don’t know anything about Cyrillic alphabets. Would fc be preferable to lc here or is it irrelevant given the character set?

Replies are listed 'Best First'.
Re^6: Problem getting Russian stopwords
by Anonymous Monk on Sep 19, 2018 at 20:02 UTC
    Thank you, in turn, for reminding me of fc! It seems to me that for Cyrillic alphabet as it is used in Russian, fc and lc are equivalent:
    use v5.16; use charnames ':full'; use List::Util 'all'; say all { fc eq lc } map chr, ord("\N{CYRILLIC CAPITAL LETTER A}")..ord("\N{CYRILLIC SMALL LETTER Y +A}") __END__ 1

      That appears to be absolutely classic, terse, ideal test code for this case and I think you deserve more ++s than the apparently single you got from me. I wish you would sign-in and participate with a username to bank the credibility and goodwill and perhaps develop friendships here. I have a fair amount of animosity for and mistrust of anonymous monks at this point. Love to see you leave that stable.

      For the interested, I added this to visualize what’s going on–

      $ perl -Mfeature=fc -Mcharnames=:full -Mv5.16 binmode STDOUT, ":encoding(UTF-8)"; say join " ", $_, lc, fc, uc for map chr, ord("\N{CYRILLIC CAPITAL LETTER A}")..ord("\N{CYRILLIC SMALL LETTE +R YA}");

      Came across this again and added the command line invocation to support the character names and the use of fc and say.