Another utf-8 decoding problem

DreamT has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have a datasource that is supposed to be an utf-8 xml file, generated on the fly. The file has umlauts (å,ä,ö) in it. I fetch the source via LWP::Simple, parse it via XML::Bare, and print it on a webpage with iso-8859-1 encoding. The result is "TrÃ¤ningsklÃ¤der" (should be "Träningskläder". I decode it using

$value = encode("latin1", decode("utf-8", $value))

, then I get Tr?ningskl?der. Any idea what I'm doing wrong?

Comment on Another utf-8 decoding problem Download Code

Replies are listed 'Best First'.
Re: Another utf-8 decoding problem by moritz (Cardinal) on Oct 11, 2010 at 11:31 UTC
First of all, please find out which encoding your terminal/console accepts (see for example Encodings and Unicod in Perl for a short guide, and how to set up clean UTF-8 environment). Then decode all incoming data, and before printing anything, set up an IO layer: `binmode STDOUT, ":encoding($encoding_supported_by_your_terminal)":` [download] If you debug output, use hexdump `-c` - it never lies (as opposed to your terminal, which often does). Perl 6 - links to (nearly) everything that is Perl 6.	[reply] [d/l] [select]
Re^2: Another utf-8 decoding problem by DreamT (Pilgrim) on Oct 11, 2010 at 11:51 UTC
Thanks, Regarding the environment, I'm unfortunately forced to use iso-encoding, since the target environment uses it. So it feels like i need to present the data in this encoding regardless of the input format?	[reply]
Re^3: Another utf-8 decoding problem by moritz (Cardinal) on Oct 11, 2010 at 11:54 UTC
When you use Perl for text processing, you always use perl's internal encoding for string representation (which is either iso-8859-1 or UTF-8, depending on the presence of the UTF-8 flag). So, you decode input data, encode output data. That's always the same workflow, independently of what your output encoding ist. Perl 6 - links to (nearly) everything that is Perl 6.	[reply]
Re^4: Another utf-8 decoding problem by DreamT (Pilgrim) on Oct 11, 2010 at 12:27 UTC
Re^5: Another utf-8 decoding problem by moritz (Cardinal) on Oct 11, 2010 at 12:48 UTC
Some notes below your chosen depth have not been shown here