in reply to use locale broken?

If you use properly decoded strings (which you do, since use utf8; is in effect) and no locales, \w, \d etc. follow Unicode semantics, which means they match more than the basic Latin characters.

I'm not very familiar with locales, but I guess that it expects the strings to be non-decoded binary strings in the encoding specified in the locale (here: UTF-8), so it might work without the utf8 pragma.

In general I recommend against locales, if you can avoid them. In my experience they are always a source of trouble, and don't bring the promised "do what I mean"-effect.

Replies are listed 'Best First'.
Re^2: use locale broken?
by december (Pilgrim) on Mar 17, 2011 at 18:13 UTC

    It seems that use locale just doesn't work well for UNICODE character sets, because it doesn't consider these locale-specific characters valid word characters. I think it's a problem in Perl, because clearly \w should include "צהו" Scandinavian characters when such a locale is in effect, UNICODE or not.

    But well, I can avoid buggy locale handling by explicitly converting all input and output to UNICODE, regardless of the user's settings. I just wish it would have worked...