in reply to XML::LibXML expand_entities always expands entities

May I ask why? "/" and the character it decodes to are completely equivalent in XML. There might not even be a way since the parser may not distinguish between the two.

Replies are listed 'Best First'.
Re^2: XML::LibXML expand_entities always expands entities
by shamu (Acolyte) on May 14, 2008 at 17:22 UTC
    I'm reading a file and I want the contents to match exactly, the contents should be unmodified. I'd like to perform a diff on the source and destination, they don't match if one is '/' and the other is '/'.

      This may or may not be any help but I do something somewhat related. I decode all the safe entities in HTML before parsing it with XML::LibXML. Along these lines-

      use HTML::Entities; our %Charmap = %HTML::Entities::entity2char; delete @Charmap{qw( amp lt gt quot apos )}; HTML::Entities::_decode_entities($html, \%Charmap);

      You would then have something closer up front for comparing. Maybe. They're both processed data but at least you'd know they processed the same.

        Thanks Mom. :)
        well, I'm afraid the solution is not sufficient, since the original document can contain some entities decoded and some not. Your solution will decode all possible entities. Is there some other way how to preserve entities in state as they were in the original document?