I agree that this is largely a guess, but there is one relatively simple heuristic that might actually help this case. Well-formed XML documents may nest tags, but can't have an inner tag close after the enclosing tag. For example:
<document><text>Some text</text></document> <!-- Valid --> <document><text>Some text</document></text> <!-- INVALID -->
So, an algorithm that makes sure nested tags are closed before the enclosing tags is a good step, and if the sample above is representative such a step will likely go a long way toward solving the problem.
Anima Legato
.oO all things connect through the motion of the mind
In reply to Re^2: Repair malformed XML
by legato
in thread Repair malformed XML
by spoulson
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |