Hello,

Thanks for your replies so far.

I tried the binmode(STDOUT, 'utf8') and it works when I have placed the string within the code:

binmode(STDOUT, ':utf8');<br /> print 'ö';

This works just as expected. I tried to open a UTF-8 encoded file using open(my $fh, '<:encoding(utf-8)', 'test') or die $!; and managed to print it in the encoding I wanted.

However, it does not have any effect on the strings I got through XML::Parser. As I said, I have an XML file which I parse and based on that a HTML::Widget object is generated, which loads some information (to fill <select> fields) from a database which is already encoded in UTF-8, so the output of $widget->as_xml() contains iso-8859-1 and UTF-8 encoded parts, which makes it impossible to utf8::encode it afterwards. Additionally, the output is generated through the Template Toolkit.

I went through the Encoding manpage, but obviously I still can't understand how encodings are handled. I was hoping for a way to tell Perl that everything should be handled in UTF-8. First I thought the utf8 pragma would do the trick, but I found out that it tells Perl only that the code is written in UTF-8. Whatever use open ':utf8'; does, it's not want I want either.

Maybe my application design makes it even more difficult to understand where to find the mistake: The chain is as follows: XML document --> XML::Parser --> HTML::Widget generation --> filling data from database in the HTML::Widget --> putting the HTML::Widget in a TT template --> output through CGI::Application

I can of course encode the contents from my XML file after parsing it, but before generating the HTML::Widget. However, I do not think that this is the cleanest solution.

Any more thoughts?


In reply to Re^2: XML::Parser - Keep Encoding? by Doron
in thread XML::Parser - Keep Encoding? by Doron

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.