in reply to XML and entities, what am I doing wrong?
Welcome to the wonderful world of XML!
I can't figure out exactly what is your original format but I will nevertheless go for the shameless plug:.
<shameless_plug>XML::Twig will happily deal with this problem. get the latest version (3.00) from here and you won't have to bother with entities being dropped.</shameless_plug>
Try playing with this code (with and without the keep_encoding option for example):
#!/bin/perl -w use strict; use XML::Twig; my $t= new XML::Twig( keep_encoding => 1); { $/= ''; while( <DATA>) { $t->parse( $_); $t->print; print "\n"; } } __DATA__ <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE doc SYSTEM "dummy"[]> <doc att="valué ">A document with text in latin1: soupçonné d'être</do +c> <?xml version="1.0"?> <!DOCTYPE doc SYSTEM "dummy"[]> <doc att="valué">A document with text in latin1:soupçonn +é d'etre</doc>
|
|---|