in reply to utf8 change case on accented characters

As answered in the CB, use utf8::upgrade on the string.

The behaviour of some Perl ops currently depends on the internal encoding of the string. utf8::upgrade and utf8::downgrade alter the internal encoding of the string.

\u and \l are implemented in terms of uc and lc, which are susceptible to this limitation/bug.

For example,

$ perl -le'use open ":std", ":locale"; $_="\xE0 la plage"; utf8::downg +rade($_); print "\u$_"' à la plage $ perl -le'use open ":std", ":locale"; $_="\xE0 la plage"; utf8::upgra +de($_); print "\u$_"' À la plage

I've got use UTF8;

I hope you mean use utf8;, which simply tells Perl the source code that contains it is encoded using UTF-8 (not iso-latin-1). It doesn't affect IO.

I have lots of name in a UTF8 text file.

Did you decode the contents back into character? One way:

open(my $fh, '<:encoding(UTF-8)', $qfn) or die("Can't open file $qfn: $!\n");

Don't forget to encoding on the way out.

Undefined subroutine &main::setlocale called

setlocale is from POSIX. Did you actually load the POSIX module and import setlocale from it?

Update: Added example.

Replies are listed 'Best First'.
Re^2: utf8 change case on accented characters
by JimmyMTL (Initiate) on Sep 09, 2009 at 17:07 UTC
    Thanks, ikegami

    I will do the utf8::upgrade and downgrade thing and see where that puts me.

    Yes, I do mean use utf8; although my code file has no BOM, it still seems to work.

    I'm so used to using setlocale on perl on our linux servers that I never even thought about why setlocale was available. Importing the module is always a good thing when it's not there by default.

    Again, many thanks, and I'll report the results with code samples for the benefit of future googlers and perl monastery residents alike...

      Yes, I do mean use utf8; although my code file has no BOM, it still seems to work.

      Byte order is immutable with UTF-8, so the BOM is useless as a BOM with UTF-8. Some applications use it as a signal that the file is encoded using UTF-8, but Perl uses use utf8; for that.

      I'll report the results with code samples

      By the way, I added an example to my earlier post.

      If you're having problems, please use Devel::Peek and provide us a Dump of the string that's giving you problems.

      Yes, I do mean use utf8; although my code file has no BOM, it still seems to work.

      Byte order is immutable with UTF-8, so the BOM is useless as a BOM with UTF-8. Some applications use it as a signal that the file is encoded using UTF-8, but Perl uses use utf8; for that.

      I'll report the results with code samples

      By the way, I added an example to my earlier post.

      If you're having problems, please use Devel::Peek and provide us a Dump of the string that's giving you problems.