mhearse has asked for the wisdom of the Perl Monks concerning the following question:

I have a beginner utf8 question. I'm quering a database which contains utf8 data. Do I need to do anything special in order to compare (eq) scalars which contain utf8 data? As apposed to the Latin1 default?

Replies are listed 'Best First'.
Re: Comparing utf8 strings
by ikegami (Patriarch) on Apr 16, 2008 at 00:31 UTC

    Perl 5.8+ has UNICODE support integrated so that eq, regular expressions, etc all work without issue.

    However, you need to decode the encoded bytes into UNICODE text strings. There are a few ways of doing that:

    • Specify the encoding when you open the file from which the text originates.

      open(my $fh, '<:encoding(UTF-8)') or die;
    • Specify the encoding for the handle after it's already open.

      binmode(STDIN, ':encoding(UTF-8)');
    • Decode the encoded data manually.

      use Encode qw( decode ); my $line_bytes = <$fh>; my $line_chars = decode('UTF-8', $line_bytes);

    Don't forget that you need to do the inverse — encode the characters into UTF-8 bytes — when you output.

    If your source code is UTF-8 encoded, add use utf8; at the top of your file.

      Of course, you'll need to find a way to deal with precomposed and combined characters. The easiest thing to do is to ignore it completely...
Re: Comparing utf8 strings
by stvn (Monsignor) on Apr 16, 2008 at 01:48 UTC

    It won't handle the actual comparisons for you, but when dealing with (possibly) utf8 strings, Devel::StringInfo is your friend.

    -stvn
Re: Comparing utf8 strings
by Anonymous Monk on Apr 16, 2008 at 00:35 UTC
    Depends on how you want to compare it. In general, you should read perlunitut and perlunifaq first.