Comparing utf8 strings

mhearse has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Comparing utf8 strings by ikegami (Patriarch) on Apr 16, 2008 at 00:31 UTC
Perl 5.8+ has UNICODE support integrated so that `eq`, regular expressions, etc all work without issue. However, you need to decode the encoded bytes into UNICODE text strings. There are a few ways of doing that: Specify the encoding when you open the file from which the text originates. `open(my $fh, '<:encoding(UTF-8)') or die;` [download] Specify the encoding for the handle after it's already open. `binmode(STDIN, ':encoding(UTF-8)');` [download] Decode the encoded data manually. `use Encode qw( decode ); my $line_bytes = <$fh>; my $line_chars = decode('UTF-8', $line_bytes);` [download] Don't forget that you need to do the inverse — encode the characters into UTF-8 bytes — when you output. If your source code is UTF-8 encoded, add `use utf8;` at the top of your file.	[reply] [d/l] [select]
Re^2: Comparing utf8 strings by Anonymous Monk on Apr 16, 2008 at 00:53 UTC
Of course, you'll need to find a way to deal with precomposed and combined characters. The easiest thing to do is to ignore it completely...	[reply]
Re: Comparing utf8 strings by stvn (Monsignor) on Apr 16, 2008 at 01:48 UTC
It won't handle the actual comparisons for you, but when dealing with (possibly) utf8 strings, Devel::StringInfo is your friend. -stvn	[reply]
Re: Comparing utf8 strings by Anonymous Monk on Apr 16, 2008 at 00:35 UTC
Depends on how you want to compare it. In general, you should read perlunitut and perlunifaq first.	[reply]