John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

I created a nested hash with some sample data, and wrote it out to see how Config::General would deal with it. Then I'll use that as an exemplar to change the configuration.

I wrote:

my $conf = new Config::General( -ConfigHash => \%Data, -UTF8 => 1); $conf->save_file ("output.conf");
Now I know my script is saved as UTF-8, because I can look in hex and see the bytes E2 80 B2 in the file where one character is. But my resulting output file contains C3 A2 C2 80 C2 B2 instead!

It encoded the individual characters E2, 80, and B2 as UTF8. But, I thought Perl 5.10 treated the source code as UTF8 naturally? Reading the docs, use encoding 'utf8' just changes the way concatenated mixed strings are upgraded or downgraded, for compatibility with length checking methods.

But adding that pragma indeed fixes the problem, and U+2032 doesn't fit in an 8-bit character, so either Config::General is doing something funny when manipulating the strings to lose the clumping, or the Perl string literal is not in UTF-8 by default?

Can someone shed light on this?

Replies are listed 'Best First'.
Re: UTF-8 Issue, or Config::General Bug, or what?
by moritz (Cardinal) on Mar 22, 2011 at 08:59 UTC
    But, I thought Perl 5.10 treated the source code as UTF8 naturally

    It does not. That's what the utf8 pragma is for: telling Perl that the source file is in UTF-8.

    And please use that instead of the encoding pragma, which suffers from several problems (see CAVEATS and KNOWN PROBLEMS section) that are better avoided.