in reply to Re: Repair malformed XML
in thread Repair malformed XML
The </SPAN> tag is missing. Your algorithm will place it right in front of the </P>. It will repair the document to well-formedness (and in the case of (X)HTML, even to a valid document). But you don't know whether the </SPAN> really belongs there. Perhaps only the 'bar' was supposed to be inside the SPAN. Or maybe the first, but not the second, EM element belonged. Or perhaps it was a special DTD, that doesn't allow EM to appear inside SPAN. Then placing </SPAN> before </P> would be very wrong.<P> foo <SPAN> bar baz <EM> qux </EM> <EM> quux </EM> </P>
If you have no way of verifying the result is correct - heck, you can't even verify whether the resulting document is syntactically valid - I'd advice you to leave the document as is. Then even the most basic check (for well-formedness) will flag the document to be incorrect. Otherwise, you end up with a document that appears to be correct, but you've no way of knowing. Of course, that raises the question, if you don't have the DTD, how useful is the document, and why is it being considered for "repair"?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Repair malformed XML
by Anonymous Monk on Jun 23, 2016 at 21:13 UTC |