Girzi has asked for the wisdom of the Perl Monks concerning the following question:

HI all ! I'm using XML::simple to parse a file build in a xml form. With a file on my compute I don't have any problems. But in fact, some informations is stocked in a mysql data base and a php send it in a xml form on a page. With a socket I save the xml and the pass it to xml::simple. I have problems with encoding, xml::simple says :
not well-formed (invalid token) at line 2, column 39, byte 61 at /usr/lib/perl5/XML/Parser.pm line 187 at /usr/lib/perl5/XML/Parser.pm line 192 XML::Parser::parse('XML::Parser=HASH(0x84e32c8)', '<?xml version="1.0"?>\x{a}<eleve nom="qsdqsd" prenom="Rapha...') called at /usr/local/share/perl/5.8.8/XML/Simple.pm line 343 XML::Simple::build_tree_xml_parser('XML::Simple=HASH(0x84e3244)', 'undef', 'SCALAR(0x82cfd58)') called at /usr/local/share/perl/5.8.8/XML/Simple.pm line 282 XML::Simple::build_tree('XML::Simple=HASH(0x84e3244)', 'undef', 'SCALAR(0x82cfd58)') called at /usr/local/share/perl/5.8.8/XML/Simple.pm line 223 XML::Simple::XMLin('<?xml version="1.0"?>\x{a}<eleve nom="qsdqsd" prenom="Rapha...') called at osp.pl line 64
my script bug because of this :
<eleve nom="qsdqsd" prenom="Raphaël" classe="Seconde L" adresse="Léon Bourgain" code_postal="3423" ville="machin" pays="France" telephone="3421434">
It's because I have some special caracters. How can I pass this problem, with encoding ? Thank you very much for your help !

Replies are listed 'Best First'.
Re: XML::Simple and encoding
by liverpole (Monsignor) on Dec 30, 2006 at 21:54 UTC
    Hi Girzi,

    How about using Unicode::Map?

    With it, you can convert the input from utf-8 to utf-16, then perform the XMLin call, and then convert back to utf-8 afterwards (if need be).

    For example:

    use strict; use warnings; use Unicode::Map(); use XML::Simple; my $string = '<eleve nom="qsdqsd" prenom="Raphaël" classe="Seconde L" +adresse="Léon Bourgain" code_postal="3423" ville="machin" pays="Franc +e" telephone="3421434" />'; # Convert to utf-16 my $map = new Unicode::Map("ISO-8859-1"); my $utf16 = $map->to_unicode($string); # Recommended to call XMLin using 'eval', to trap errors my $ref = eval { XMLin($utf16) }; $@ and die "An error occurred in XMLin: '$@'\n"; # Now you can use '$ref' ... my $out = XMLout($ref); print "XMLout => '$out'\n";

    If necessary, when you use the resulting "$ref", you can convert it back to utf-8 with the from_unicode() method of Unicode::Map.


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
      Thank you for help ! It works ;) print "Thank you Very Much " x 1000;
Re: XML::Simple and encoding
by ikegami (Patriarch) on Dec 30, 2006 at 21:59 UTC

    I did some digging and I bet the parser assumes the document uses UTF-8 (since you didn't specify otherwise), and that your document is not in UTF-8. Try providing the appropriate encoding. The underlying parser (expat) pays attention to the encoding parameter in the XML declaration, so it might be as easy as specifying it.

    <?xml version="1.0" encoding="ISO-8859-1"?>

      I did a bit more research, and it looks like it's not a bug or limitation of the parser. Your XML document is bad. According to Extensible Markup Language (XML) 1.0 (Fourth Edition),

      In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is a fatal error for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8.

      Make sure to use something like

      <?xml version="1.0" encoding="ISO-8859-1"?>