DreamT has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have a datasource that is supposed to be an utf-8 xml file, generated on the fly. The file has umlauts (å,ä,ö) in it. I fetch the source via LWP::Simple, parse it via XML::Bare, and print it on a webpage with iso-8859-1 encoding. The result is "Träningskläder" (should be "Träningskläder". I decode it using

$value = encode("latin1", decode("utf-8", $value))

, then I get Tr?ningskl?der. Any idea what I'm doing wrong?

Replies are listed 'Best First'.
Re: Another utf-8 decoding problem
by moritz (Cardinal) on Oct 11, 2010 at 11:31 UTC
    First of all, please find out which encoding your terminal/console accepts (see for example Encodings and Unicod in Perl for a short guide, and how to set up clean UTF-8 environment).

    Then decode all incoming data, and before printing anything, set up an IO layer:

    binmode STDOUT, ":encoding($encoding_supported_by_your_terminal)":

    If you debug output, use hexdump -c - it never lies (as opposed to your terminal, which often does).

    Perl 6 - links to (nearly) everything that is Perl 6.
      Thanks,

      Regarding the environment, I'm unfortunately forced to use iso-encoding, since the target environment uses it. So it feels like i need to present the data in this encoding regardless of the input format?
        When you use Perl for text processing, you always use perl's internal encoding for string representation (which is either iso-8859-1 or UTF-8, depending on the presence of the UTF-8 flag).

        So, you decode input data, encode output data. That's always the same workflow, independently of what your output encoding ist.

        Perl 6 - links to (nearly) everything that is Perl 6.