in reply to perl 5.14 regex: case insensitive match on international characters

In case your source code (the string literals) is UTF-8 encoded, you want to add use utf8; — in which case it works fine for me (with the same version of Perl).

  • Comment on Re: perl 5.14 regex: case insensitive match on international characters
  • Download Code

Replies are listed 'Best First'.
Re^2: perl 5.14 regex: case insensitive match on international characters
by shamat (Acolyte) on Mar 15, 2012 at 16:22 UTC
    Thanks for the quick reply! If I add use utf8; the regex matches, but I get:

    Malformed UTF-8 character (unexpected continuation byte 0x9a, with no preceding start byte) at utf temp8.pl line 4.

    Any suggestion on how to deal with this?

      As well as use utf8 make sure your script is actually saved as UTF8. (Your editor will probably offer a choice of encodings upon saving the file.)

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
        Thank you so much! Everything works fine now. Yepeeeee :)
        Hi monks. Sorry to bother again, but anytime I try to print a double quoted string, I get this "Wide character in print" message. For instance:
        use utf8; use feature 'unicode_strings'; my $string1 = "ŠIN"; my $string2 = "šin"; print "$string1 matches $string2 ? "; print $string1 =~ /$string2/i ? 'matched' : 'no match';
        The result is correct though: ŠIN matches šin ? matcheded. Do you know why perl is assuming that wide character are in print? This in fact causes me troubles with regular expressions somwhere elese in the code: I believe that š and Š are "seen as" (sorry for the horrible terminology) �, and that is why they match in the code above. Thanks for the help!