Thanks to everyone who replied to this message.

After giving the problem a bit more thought it occured to me that allowing the XML parser to ignore errors and to continue processing makes no more sense than allowing the perl intepreter to continue when it finds a syntax error. Moreover, allowing the XML parser to continue would lead to many of the same problems that we currently have with HTML. Permissive HTML parsers such as the one used by IE that will allow improperly nested tags, incomplete documents, unclosed tags, and so on lead to HTML designers to create that are usable only by the broken parser and foster bad programming/design habits. I would hate to see that happen with XML so I will not contribute to the problem by either writting an XML::Preprocessor module or adding this functionality to any sort of production system I create.

As far as the solution to my problem goes, I wrote a small filter to take care of the & characters before passing the XML doc to the parser. I decided to ignore the bare > and < characters since their presence may indicate either a problem with tags in the document or a legitimate part of the document text.

Once again, thanks for the insight.

----
Coyote


In reply to Re: Dealing with Malformed XML by Coyote
in thread Dealing with Malformed XML by Coyote

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.