in reply to Re: Repair malformed XML
in thread Repair malformed XML

I stand corrected about the size limitation. Upon further testing, it is not the size, but the encoding. The XML file is unicode with encoding="iso-10646-ucs-2". If I convert to ASCII and set encoding="UTF-8", LibXML parses it fine.

Unfortunately, the output of above script becomes mangles after a few thousand lines. It begins to only output the Text objects, and no tags, cdata's, etc. Strange.

While I haven't discovered a generalized and automated method that works, I've managed to get by with a simple procedural rule of inserting </o:version> tags before </cc:files> if not already present. Then I convert back to Unicode and the XML can be parsed.