Re^4: perl 5.14 regex: case insensitive match on international characters

Hi monks. Sorry to bother again, but anytime I try to print a double quoted string, I get this "Wide character in print" message. For instance:

use utf8;
use feature 'unicode_strings';
my $string1 = "ŠIN";
my $string2 = "šin";
print "$string1 matches $string2 ? ";
print $string1 =~ /$string2/i ? 'matched' : 'no match';
[download]

The result is correct though: ŠIN matches šin ? matcheded. Do you know why perl is assuming that wide character are in print? This in fact causes me troubles with regular expressions somwhere elese in the code: I believe that š and Š are "seen as" (sorry for the horrible terminology) �, and that is why they match in the code above. Thanks for the help!

Comment on Re^4: perl 5.14 regex: case insensitive match on international characters Select or Download Code

Replies are listed 'Best First'.
Re^5: perl 5.14 regex: case insensitive match on international characters by tobyink (Canon) on Mar 16, 2012 at 17:55 UTC
When you open a filehandle, if you want to print non-ASCII characters to it, you must specify an encoding. STDOUT is a filehandle. Add something like this near the top of your script: `binmode(STDOUT, ":utf8");` If you expect to read from STDIN, then you might want to call `binmode` on that too. There's utf8::all which takes care of a lot of these gotchas. If everyone used UTF-8 for everything, and assumed that everybody else did too, then life would be much easier. `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply] [d/l] [select]
Re^6: perl 5.14 regex: case insensitive match on international characters by shamat (Acolyte) on Mar 19, 2012 at 15:44 UTC
Thank you so much Tobyink! utf8::all is what I was looking for :)	[reply]