in reply to XML and entities, what am I doing wrong?

Welcome to the wonderful world of XML!

I can't figure out exactly what is your original format but I will nevertheless go for the shameless plug:.

<shameless_plug>XML::Twig will happily deal with this problem. get the latest version (3.00) from here and you won't have to bother with entities being dropped.</shameless_plug>

Try playing with this code (with and without the keep_encoding option for example):

#!/bin/perl -w use strict; use XML::Twig; my $t= new XML::Twig( keep_encoding => 1); { $/= ''; while( <DATA>) { $t->parse( $_); $t->print; print "\n"; } } __DATA__ <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE doc SYSTEM "dummy"[]> <doc att="valué ">A document with text in latin1: soupçonné d'être</do +c> <?xml version="1.0"?> <!DOCTYPE doc SYSTEM "dummy"[]> <doc att="valu&eacute;">A document with text in latin1:soup&ccedil;onn +&eacute; d'etre</doc>