in reply to UTF8 and XML

Keeping in mind that \311 is the iso-8859-1 encoding of U+00C9, and that \303\211 is the UTF-8 encoding of the same character, you can see that XML::Simple properly extracts text from XML:

#!/usr/bin/perl use strict; use warnings; use Data::Dumper qw( Dumper ); use XML::Simple qw( ); $XML::Simple::PREFERRED_PARSER = 'XML::Parser'; my $latin1_xml = <<"__EOI__"; <?xml version="1.0" encoding="iso-8859-1"?> <root>\311ric</root> __EOI__ my $utf8_xml = <<"__EOI__"; <?xml version="1.0" encoding="UTF-8"?> <root>\303\211ric</root> __EOI__ my $xs = XML::Simple->new(); for my $xml ($latin1_xml, $utf8_xml) { my $tree = $xs->XMLin($xml, ForceArray => 1, KeepRoot => 1, ); local $Data::Dumper::Useqq = 1; print Dumper $tree; }
$VAR1 = { 'root' => [ "\x{c9}ric" ] }; $VAR1 = { 'root' => [ "\x{c9}ric" ] };

It also outputs XML properly (albeit using a weird interface):

#!/usr/bin/perl use strict; use warnings; use Data::Dumper qw( Dumper ); use XML::Simple qw( ); my $tree = { 'root' => [ "\x{c9}ric" ] }; $XML::Simple::PREFERRED_PARSER = 'XML::Parser'; my $xs = XML::Simple->new(); for my $enc (qw( iso-8859-1 UTF-8 )) { my $xml = ''; { open(my $fh, ">:encoding($enc)", \$xml) or die; $xs->XMLout($tree, XMLDecl => qq{<?xml version="1.0" encoding="$enc"?>}, KeepRoot => 1, OutputFile => $fh, ); close($fh); } local $Data::Dumper::Useqq = 1; print Dumper $xml; }
$VAR1 = "<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n<root>\311ri +c</root>\n"; $VAR1 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<root>\303\211ric +</root>\n";

Replies are listed 'Best First'.
Re^2: UTF8 and XML
by clintonm9 (Sexton) on Mar 09, 2010 at 03:15 UTC

    So if i wanted to send ISO-8859-1 xml i would just add that to the header? Doing this didnt make a difference. Please see

    #!/usr/bin/perl use strict; use utf8; # A simple test to show the UTF8 problem my $parameters; push (@{ $parameters->{Request} }, { URI => '/HRM/EmploymentManager/AvailableOpening +s', Action => 'GET', ID => '123', Parameters => { Status => 'Test', }, }); # convert Perl hash ref into XML my $xs = XML::Simple->new(); my $x = $xs->XMLout($parameters, KeepRoot => 0, RootName => 'Requests' +,XMLDecl => qq{<?xml version="1.0" encoding="iso-8859-1"?>}); print $x; # convert XML into Perl hash ref my $xs = XML::Simple->new(); my $XML = $xs->XMLin($x,ForceArray => 0); # Look at the perl hash ref, there shouldnt be any my $temp = $XML->{'Request'}->{'Action'}; my $flag = utf8::is_utf8($temp); print "$flag ! $temp\n\n\n"; exit;

      No, you removed the encoding. While XML::Simple properly decodes when parsing XML, it doesn't encode when generating XML. (That's a bug. I called it a "weird interface" earlier.)