Re: perl 5.14 regex: case insensitive match on international characters

Replies are listed 'Best First'.
Re^2: perl 5.14 regex: case insensitive match on international characters by shamat (Acolyte) on Mar 15, 2012 at 16:22 UTC
Thanks for the quick reply! If I add `use utf8;` the regex matches, but I get: Malformed UTF-8 character (unexpected continuation byte 0x9a, with no preceding start byte) at utf temp8.pl line 4. Any suggestion on how to deal with this?	[reply] [d/l]
Re^3: perl 5.14 regex: case insensitive match on international characters by tobyink (Canon) on Mar 15, 2012 at 17:12 UTC
As well as `use utf8` make sure your script is actually saved as UTF8. (Your editor will probably offer a choice of encodings upon saving the file.) `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply] [d/l]
Re^4: perl 5.14 regex: case insensitive match on international characters by shamat (Acolyte) on Mar 16, 2012 at 09:09 UTC
Thank you so much! Everything works fine now. Yepeeeee :)	[reply]
Re^4: perl 5.14 regex: case insensitive match on international characters by shamat (Acolyte) on Mar 16, 2012 at 17:15 UTC
Hi monks. Sorry to bother again, but anytime I try to print a double quoted string, I get this "Wide character in print" message. For instance: `use utf8; use feature 'unicode_strings'; my $string1 = "ŠIN"; my $string2 = "šin"; print "$string1 matches $string2 ? "; print $string1 =~ /$string2/i ? 'matched' : 'no match';` [download] The result is correct though: `ŠIN matches šin ? matcheded`. Do you know why perl is assuming that wide character are in print? This in fact causes me troubles with regular expressions somwhere elese in the code: I believe that š and Š are "seen as" (sorry for the horrible terminology) �, and that is why they match in the code above. Thanks for the help!	[reply] [d/l] [select]
Re^5: perl 5.14 regex: case insensitive match on international characters by tobyink (Canon) on Mar 16, 2012 at 17:55 UTC
Re^6: perl 5.14 regex: case insensitive match on international characters by shamat (Acolyte) on Mar 19, 2012 at 15:44 UTC