in reply to regex on XML

This is a major problem. Normally, I would suggest you use some form of XML-parser which would give you a nice separation between tags and "content" and then you run HTML::Entities on the content.

But of course any decent XML-parser (such as XML::Parser) will choke on this "bad" XML (I wonder if technically it is even XML due to the missing encoding of 'forbidden' characters).

Therefore I suggest that you try to capture these 'forbidden' characters before they enter your XML. Can't you run the encode-function of HTML::Entities on the incoming data, prior to it being XML-ized?

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law