RenalPete has asked for the wisdom of the Perl Monks concerning the following question:
Script:<?xml version="1.0"?> <DocumentRoot> <Element Attr="Bär" /> </DocumentRoot>
The output file created has the extended character (a umlaut) written un-encoded. I'm not sure if this will display properly:use XML::DOM; my $file = @ARGV[0]; my $parser = new XML::DOM::Parser(); my $doc; eval { $doc = $parser->parsefile( $file ); }; if ($@) { die "parsefile() failed: $@\n"; } $doc->printToFile($file."_out"); exit;
If I then pass the this output back to the script, I get:<?xml version="1.0"?> <DocumentRoot> <Element Attr="Bär"/> </DocumentRoot>
I've had a look in the XML::DOM code, and I reckon that encodeText() would be the place to do it, however this appears to take a list of characters which should be encoded - for Unicode this would be a pretty big list :-) It's quite possible that there's something about hidden nodes which could be relevant - can anyone point me in the right direction?parsefile() failed: not well-formed (invalid token) at line 3, column 17, byte 54 at /usr/ +lib/perl5/XML/Parser.pm line 187
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: XML::DOM not re-encoding character references of unicode characters?
by Jenda (Abbot) on Nov 19, 2007 at 15:56 UTC |