in reply to Re: XML::Parser - Keep Encoding?
in thread XML::Parser - Keep Encoding?

Hello,

Thanks for your replies so far.

I tried the binmode(STDOUT, 'utf8') and it works when I have placed the string within the code:

binmode(STDOUT, ':utf8');<br /> print 'ö';

This works just as expected. I tried to open a UTF-8 encoded file using open(my $fh, '<:encoding(utf-8)', 'test') or die $!; and managed to print it in the encoding I wanted.

However, it does not have any effect on the strings I got through XML::Parser. As I said, I have an XML file which I parse and based on that a HTML::Widget object is generated, which loads some information (to fill <select> fields) from a database which is already encoded in UTF-8, so the output of $widget->as_xml() contains iso-8859-1 and UTF-8 encoded parts, which makes it impossible to utf8::encode it afterwards. Additionally, the output is generated through the Template Toolkit.

I went through the Encoding manpage, but obviously I still can't understand how encodings are handled. I was hoping for a way to tell Perl that everything should be handled in UTF-8. First I thought the utf8 pragma would do the trick, but I found out that it tells Perl only that the code is written in UTF-8. Whatever use open ':utf8'; does, it's not want I want either.

Maybe my application design makes it even more difficult to understand where to find the mistake: The chain is as follows: XML document --> XML::Parser --> HTML::Widget generation --> filling data from database in the HTML::Widget --> putting the HTML::Widget in a TT template --> output through CGI::Application

I can of course encode the contents from my XML file after parsing it, but before generating the HTML::Widget. However, I do not think that this is the cleanest solution.

Any more thoughts?

Replies are listed 'Best First'.
Re^3: XML::Parser - Keep Encoding?
by ikegami (Patriarch) on Apr 23, 2009 at 02:11 UTC
    I have no problem.
    use strict; use warnings; use HTML::Widget qw( ); my ($up) = @ARGV or die; my $s = chr(0xC9); if ($up) { utf8::upgrade($s); # Internally encoded as UTF-8 } else { utf8::downgrade($s); # Internally encoded as iso-latin-1 } my $w = HTML::Widget->new('widget')->method('GET'); $w->element('Textfield', 's')->label($s)->value($s); my $form = $w->process(); binmode(STDOUT, ':utf8'); print(<<"__EOI__"); <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <title>Test</title> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <p>$s</p> $form __EOI__
    >perl 759467.pl 0 > 0.html >perl 759467.pl 1 > 1.html >fc /b 0.html 1.html Comparing files 0.html and 1.HTML FC: no differences encountered

    HTML::Widget also worked fine with characters iso-latin-1 can't represent.

    Please provide a minimal program to demonstrate your program.

      Sorry for the late reply, I was on vacation.

      When I tried to reproduce my error I found out that it was actually a TT issue, because if I call binmode(STDOUT, ':utf8') it does not have an effect on TT's output (see discussion on their mailing list).

      Thanks for your effort!