Re: utf8 change case on accented characters

As answered in the CB, use utf8::upgrade on the string.

The behaviour of some Perl ops currently depends on the internal encoding of the string. utf8::upgrade and utf8::downgrade alter the internal encoding of the string.

\u and \l are implemented in terms of uc and lc, which are susceptible to this limitation/bug.

For example,

$ perl -le'use open ":std", ":locale"; $_="\xE0 la plage"; utf8::downg
+rade($_); print "\u$_"'
à la plage

$ perl -le'use open ":std", ":locale"; $_="\xE0 la plage"; utf8::upgra
+de($_); print "\u$_"'
À la plage
[download]

I've got use UTF8;

I hope you mean use utf8;, which simply tells Perl the source code that contains it is encoded using UTF-8 (not iso-latin-1). It doesn't affect IO.

I have lots of name in a UTF8 text file.

Did you decode the contents back into character? One way:

open(my $fh, '<:encoding(UTF-8)', $qfn)
    or die("Can't open file $qfn: $!\n");
[download]

Don't forget to encoding on the way out.

Undefined subroutine &main::setlocale called

setlocale is from POSIX. Did you actually load the POSIX module and import setlocale from it?

Update: Added example.

Comment on Re: utf8 change case on accented characters Select or Download Code

Replies are listed 'Best First'.
Re^2: utf8 change case on accented characters by JimmyMTL (Initiate) on Sep 09, 2009 at 17:07 UTC
Thanks, ikegami I will do the utf8::upgrade and downgrade thing and see where that puts me. Yes, I do mean use utf8; although my code file has no BOM, it still seems to work. I'm so used to using setlocale on perl on our linux servers that I never even thought about why setlocale was available. Importing the module is always a good thing when it's not there by default. Again, many thanks, and I'll report the results with code samples for the benefit of future googlers and perl monastery residents alike...	[reply]
Re^3: utf8 change case on accented characters by ikegami (Patriarch) on Sep 09, 2009 at 17:35 UTC
Yes, I do mean use utf8; although my code file has no BOM, it still seems to work. Byte order is immutable with UTF-8, so the BOM is useless as a BOM with UTF-8. Some applications use it as a signal that the file is encoded using UTF-8, but Perl uses `use utf8;` for that. I'll report the results with code samples By the way, I added an example to my earlier post. If you're having problems, please use Devel::Peek and provide us a `Dump` of the string that's giving you problems.	[reply] [d/l] [select]
Re^3: utf8 change case on accented characters by ikegami (Patriarch) on Sep 09, 2009 at 17:37 UTC
Yes, I do mean use utf8; although my code file has no BOM, it still seems to work. Byte order is immutable with UTF-8, so the BOM is useless as a BOM with UTF-8. Some applications use it as a signal that the file is encoded using UTF-8, but Perl uses `use utf8;` for that. I'll report the results with code samples By the way, I added an example to my earlier post. If you're having problems, please use Devel::Peek and provide us a `Dump` of the string that's giving you problems.	[reply] [d/l] [select]