in reply to Entities confuse encoding in XML::Simple

Maybe you have to to set up your environment a bit better:
binmode STDOUT, ':encdoding(UTF-8)'; # or whatever your terminal uses

Replies are listed 'Best First'.
Re^2: Entities confuse encoding in XML::Simple
by Anonymous Monk on Jan 03, 2008 at 13:17 UTC
    Even if it somehow had the wrong encoding, isn't it strange that some of the output is correct and some is not?

    Anyway, the terminal is utf8 and I was surprised to see that binmode STDOUT, ':encoding(UTF-8)' actually made it worse:
    foo foo with an – some latin1 encoded chars: æøå ÆØÅ same, but this time whith an – .. æøå ÆØÅ same, but this thime with an ” instead .. : æøå ÆØÃ&#13 +3; $VAR1 = { 'p' => [ 'foo', "foo with an \x{2013}", 'some latin1 encoded chars: æøå ÆØÃ& +#133;', "same, but this time whith an \x{2013} .. \x{c3}\x{ +a6}\x{c3}\x{b8}\x{c3}\x{a5} \x{c3}\x{86}\x{c3}\x{98}\x{c3}\x{85}", "same, but this thime with an \x{201d} instead .. : + \x{c3}\x{a6}\x{c3}\x{b8}\x{c3}\x{a5} \x{c3}\x{86}\x{c3}\x{98}\x{c3}\ +x{85}" ] };
      Did you find a solution? I'm having the same problem as you.
        kind of, but its more of workaround. I juts go through the xml and replace every & with an &amp before I pass it on to XMLin.
        Something like:
        my @xml = <FILE>; # slurp .. close(FILE); foreach( @xml){ # Ugly hack to keep XML::Simple(?) from double encoding certain stri +ngs, # see: perlmonks.org/index.pl?node_id=660162 and perlmonks.org/index +.pl?node_id=215678 s/&/&amp;/g; }
        I _think_ number entities like &x### are legal in xml and should be either left untouched or maybe "converted" into the utf8 char?
        anyone?