An example of the malformed XML looks like:
Here, you can see that <o:version> does not have a closing tag. While I do not have a DTD or Schema of this file, I believe I can make the assumption that <o:version> encloses everything up to </cc:files>. This pattern of missing </o:version> tags repeats over and over throughout the XML file for each file that it describes (there are 20,000+ files). But, there are some cases in the XML where the </o:version> closing tag does appear where it should. The process needs to determine if the tag is missing. Heck, if any closing tag is missing.<cc:files> <o:destination><![CDATA[/Documents/some file.pdf]]></o:destinati +on> <y:name><![CDATA[some file.pdf]]></y:name> <r:DenyAccess dt="mv.string"></r:DenyAccess> <o:version> <d:contentclass><![CDATA[urn:content-classes:baseddocument]]> +</d:contentclass> <o:source><![CDATA[\sources\some file.pdf\some file.pdf]]></o +:source> <a:FriendlyVersionID>1.0</a:FriendlyVersionID> <a:owner><![CDATA[SERVER\iusr_server]]></a:owner> <a:CreatedTimeStamp>2/1/2005 6:41:30 PM</a:CreatedTimeStamp> <a:DocumentState>approved</a:DocumentState> <a:IsCurrentVersion>False</a:IsCurrentVersion> </cc:files>
I know XML::LibXML won't like it, because it must be well-formed. I imagine XML::Parser could do this, but I can't really visualize how to do it. Could someone please offer some wisdom?
In reply to Repair malformed XML by spoulson
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |