Perl 5.8+ has UNICODE support integrated so that eq, regular expressions, etc all work without issue.
However, you need to decode the encoded bytes into UNICODE text strings. There are a few ways of doing that:
-
Specify the encoding when you open the file from which the text originates.
open(my $fh, '<:encoding(UTF-8)')
or die;
-
Specify the encoding for the handle after it's already open.
binmode(STDIN, ':encoding(UTF-8)');
-
Decode the encoded data manually.
use Encode qw( decode );
my $line_bytes = <$fh>;
my $line_chars = decode('UTF-8', $line_bytes);
Don't forget that you need to do the inverse — encode the characters into UTF-8 bytes — when you output.
If your source code is UTF-8 encoded, add use utf8; at the top of your file.
| [reply] [d/l] [select] |
Of course, you'll need to find a way to deal with precomposed and combined characters. The easiest thing to do is to ignore it completely...
| [reply] |
It won't handle the actual comparisons for you, but when dealing with (possibly) utf8 strings, Devel::StringInfo is your friend.
| [reply] |
Depends on how you want to compare it.
In general, you should read perlunitut and perlunifaq first. | [reply] |