in reply to Dealing with Malformed XML

Do something like this before passing your text to the XML parser?

s/&(\W|$)/&amp;$1/g; s/<([^/\w]|$)/&lt;$1/g; s/(^|\W)>/$1&gt;/g;
But I think Re: Maximum parsing depth with XML::Parser? probably does a better job of this and implies the the greater-thans aren't a problem.

I recall a module like this for HTML. It would find common mistakes (like unquoted attributes) and fix them. Something like that would be even more useful as a module for XML since the spec says to reject invalid input.

        - tye (but my friends call me "Tye")

Replies are listed 'Best First'.
Re: (tye)Re: Dealing with Malformed XML
by Coyote (Deacon) on Jan 09, 2001 at 23:15 UTC
    Thanks for the pointers. I don't think the <!CDATA ... solution detailed in Re: Maximum parsing depth with XML::Parser? is the right approach for this task. The &, <, and > characters should be entities in this instance.

    ---- Coyote (aka: Rich Anderson)