in reply to UTF8 and XML
Keeping in mind that \311 is the iso-8859-1 encoding of U+00C9, and that \303\211 is the UTF-8 encoding of the same character, you can see that XML::Simple properly extracts text from XML:
#!/usr/bin/perl use strict; use warnings; use Data::Dumper qw( Dumper ); use XML::Simple qw( ); $XML::Simple::PREFERRED_PARSER = 'XML::Parser'; my $latin1_xml = <<"__EOI__"; <?xml version="1.0" encoding="iso-8859-1"?> <root>\311ric</root> __EOI__ my $utf8_xml = <<"__EOI__"; <?xml version="1.0" encoding="UTF-8"?> <root>\303\211ric</root> __EOI__ my $xs = XML::Simple->new(); for my $xml ($latin1_xml, $utf8_xml) { my $tree = $xs->XMLin($xml, ForceArray => 1, KeepRoot => 1, ); local $Data::Dumper::Useqq = 1; print Dumper $tree; }
$VAR1 = { 'root' => [ "\x{c9}ric" ] }; $VAR1 = { 'root' => [ "\x{c9}ric" ] };
It also outputs XML properly (albeit using a weird interface):
#!/usr/bin/perl use strict; use warnings; use Data::Dumper qw( Dumper ); use XML::Simple qw( ); my $tree = { 'root' => [ "\x{c9}ric" ] }; $XML::Simple::PREFERRED_PARSER = 'XML::Parser'; my $xs = XML::Simple->new(); for my $enc (qw( iso-8859-1 UTF-8 )) { my $xml = ''; { open(my $fh, ">:encoding($enc)", \$xml) or die; $xs->XMLout($tree, XMLDecl => qq{<?xml version="1.0" encoding="$enc"?>}, KeepRoot => 1, OutputFile => $fh, ); close($fh); } local $Data::Dumper::Useqq = 1; print Dumper $xml; }
$VAR1 = "<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n<root>\311ri +c</root>\n"; $VAR1 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<root>\303\211ric +</root>\n";
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: UTF8 and XML
by clintonm9 (Sexton) on Mar 09, 2010 at 03:15 UTC | |
by ikegami (Patriarch) on Mar 09, 2010 at 05:21 UTC |