John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:
I wrote:
Now I know my script is saved as UTF-8, because I can look in hex and see the bytes E2 80 B2 in the file where one character is. But my resulting output file contains C3 A2 C2 80 C2 B2 instead!my $conf = new Config::General( -ConfigHash => \%Data, -UTF8 => 1); $conf->save_file ("output.conf");
It encoded the individual characters E2, 80, and B2 as UTF8. But, I thought Perl 5.10 treated the source code as UTF8 naturally? Reading the docs, use encoding 'utf8' just changes the way concatenated mixed strings are upgraded or downgraded, for compatibility with length checking methods.
But adding that pragma indeed fixes the problem, and U+2032 doesn't fit in an 8-bit character, so either Config::General is doing something funny when manipulating the strings to lose the clumping, or the Perl string literal is not in UTF-8 by default?
Can someone shed light on this?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: UTF-8 Issue, or Config::General Bug, or what?
by moritz (Cardinal) on Mar 22, 2011 at 08:59 UTC |