in reply to Re: Entities confuse encoding in XML::Simple
in thread Entities confuse encoding in XML::Simple

Even if it somehow had the wrong encoding, isn't it strange that some of the output is correct and some is not?

Anyway, the terminal is utf8 and I was surprised to see that binmode STDOUT, ':encoding(UTF-8)' actually made it worse:
foo foo with an – some latin1 encoded chars: æøå ÆØÅ same, but this time whith an – .. æøå ÆØÅ same, but this thime with an ” instead .. : æøå ÆØÃ&#13 +3; $VAR1 = { 'p' => [ 'foo', "foo with an \x{2013}", 'some latin1 encoded chars: æøå ÆØÃ& +#133;', "same, but this time whith an \x{2013} .. \x{c3}\x{ +a6}\x{c3}\x{b8}\x{c3}\x{a5} \x{c3}\x{86}\x{c3}\x{98}\x{c3}\x{85}", "same, but this thime with an \x{201d} instead .. : + \x{c3}\x{a6}\x{c3}\x{b8}\x{c3}\x{a5} \x{c3}\x{86}\x{c3}\x{98}\x{c3}\ +x{85}" ] };

Replies are listed 'Best First'.
Re^3: Entities confuse encoding in XML::Simple
by Anonymous Monk on Jan 09, 2008 at 14:29 UTC
    Did you find a solution? I'm having the same problem as you.
      kind of, but its more of workaround. I juts go through the xml and replace every & with an &amp before I pass it on to XMLin.
      Something like:
      my @xml = <FILE>; # slurp .. close(FILE); foreach( @xml){ # Ugly hack to keep XML::Simple(?) from double encoding certain stri +ngs, # see: perlmonks.org/index.pl?node_id=660162 and perlmonks.org/index +.pl?node_id=215678 s/&/&amp;/g; }
      I _think_ number entities like &x### are legal in xml and should be either left untouched or maybe "converted" into the utf8 char?
      anyone?