If I reverse engineered a DTD, would my chances of earning my XML repair badge be better?
Maybe. That will depend on the DTD. But how do you know that what you reverse engineer is correct? Or perhaps you reverse engineer a DTD (which may, or may not) be correct, and allows non-ambigious repairs. (That's not so far-fetched. Consider an HTML or XHTML document with the some of the </EM> tags missing. It will not always be clear where to insert the missing tags, even if you assume they belong just before or after some other tag).

One disadvantage of attempting to repair, and not knowing how to recognize a correct document, is that you may end up with a document that is well-formed, or even conforming to the DTD you have, or reversed engineered, is that you do not know whether you ended up with the right document.

Consider a Perl program of which a quote is missing. You could write a "repair" program that noticed a quote is missing, and puts a quote back into the program. Now, if you just randomly inserted the quote in the program, you're likely to end up with a program that still doesn't compile. But for most programs that are missing a quote, there will be more than one place the quote can be inserted, and you still have a compilable program. Which one should your repair program pick? How does it now it's right?


In reply to Re^3: Repair malformed XML by Anonymous Monk
in thread Repair malformed XML by spoulson

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.