OK, I'm in UTF-8 hell here. I thought that I would at least verify that I could read and write UTF-8 strings to my database, but while working on that, I discovered that I can't even get a trivial Perl program to run understandably.
Here's my test program:
#!/usr/bin/perl -w
use strict;
use utf8;
my (
$codepoint,
$test_string,
);
binmode STDOUT, ":utf8";
$codepoint = ord('我');
print "Codepoint of character is $codepoint\n";
$test_string = "Here's a test string with 我\n";
print "test_string is $test_string\n";
$test_string_dec = decode('utf8', $test_string);
(In my original, I had the literal Chinese character 我 where you see the big numeric constant.)
There are at least two issues here:
- The presence of the "use utf8" pragma: I gather from Googling that this is no longer required in current versions of Perl (I'm using 5.8.8.) But, if I leave it out, the codepoint reported by "ord" is 250, instead of 25105. Surely Perl should know that the Chinese character is Unicode?
- The "binmode" statement: This interacts with the "use utf8" pragma in the following ways:
both present: correct codepoint, correct output character, no error message
pragma present, binmode omitted: correct codepoint, correct output character, "Wide character in print" error message.
pragma omitted, binmode present: wrong codepoint, wrong output character, no error message
Both omitted: wrong codepoint, output shows correct Chinese char, no error message.
So, I'm really confused. What does the "use utf8" pragma actually do in Perl 5.8.8? Why do I get the correct character showing on output even when I get the "Wide character in output..." message?
--- Marais