(tye)Re: Dealing with Malformed XML

Do something like this before passing your text to the XML parser?

s/&(\W|$)/&amp;$1/g;
s/<([^/\w]|$)/&lt;$1/g;
s/(^|\W)>/$1&gt;/g;
[download]

But I think Re: Maximum parsing depth with XML::Parser? probably does a better job of this and implies the the greater-thans aren't a problem.

I recall a module like this for HTML. It would find common mistakes (like unquoted attributes) and fix them. Something like that would be even more useful as a module for XML since the spec says to reject invalid input.

- tye (but my friends call me "Tye")

Comment on (tye)Re: Dealing with Malformed XML Download Code

Replies are listed 'Best First'.
Re: (tye)Re: Dealing with Malformed XML by Coyote (Deacon) on Jan 09, 2001 at 23:15 UTC
Thanks for the pointers. I don't think the `<!CDATA ...` solution detailed in Re: Maximum parsing depth with XML::Parser? is the right approach for this task. The &, <, and > characters should be entities in this instance. ---- Coyote (aka: Rich Anderson)	[reply] [d/l]