in reply to lowercasing accented characters
The behaviour of lc and such can vary based on how the string is stored internally. This is a bug, but it can't be fixed due to historical reasons. You can work around the problem by switching the internal storage format of the string.
use open ':std', ':encoding(UTF-8)'; # UTF-8 terminal my $s = "\xDC"; utf8::ugprade( $s ); # Use Unicode semantics print lc($s), "\n";
Perl 5.12 has a pragma to control the behaviour of lc.
use open ':std', ':encoding(UTF-8)'; # UTF-8 terminal use feature 'unicode_strings'; # Or "use 5.012;" my $s = "\xDC"; print lc($s), "\n";
|
|---|