roman has asked for the wisdom of the Perl Monks concerning the following question:
yieldsuse XML::LibXML; my $str = "<flower>r\x{16f}\x{17e}e</flower>"; my $doc = XML::LibXML->new->parse_string($str); warn "Document ", $doc->toString, "\n"; warn "Encoding ", $doc->encoding, "\n";
The only way to achieve this effect on already parsed document which I found is to set the encoding to ISO-8859-1 (since I cannot "reset" the encoding).Document <?xml version="1.0"?> <flower>růže</flower> Encoding
$doc->setEncoding('iso-8859-1')
yieldsuse XML::LibXML; my $str = '<?xml version="1.0" encoding="utf8"?>' . "<flower>r\x{16f}\x{17e}e</flower>"; my $doc = XML::LibXML->new->parse_string($str); warn "Document ", $doc->toString, "\n"; warn "Encoding ", $doc->encoding, "\n\n"; $doc->setEncoding('iso-8859-1'); warn "Document ", $doc->toString, "\n"; warn "Encoding ", $doc->encoding, "\n";
Does this method have any danger or drawback? Is there a better way how to "clear" the encoding? I would find it very useful since the serialized text with entities is imune against any encoding changes when stored to database (Oracle). Thanks, RomanDocument <?xml version="1.0" encoding="utf8"?> <flower>rĹŻĹže</flower> Encoding utf8 Document <?xml version="1.0" encoding="iso-8859-1"?> <flower>růže</flower> Encoding iso-8859-1
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: XML::LibXML document serialized with diacritics as unicode entities
by Joost (Canon) on Sep 29, 2006 at 21:33 UTC |