in reply to Forcing XML to validate

What does the XML that isn't parsing look like? That isn't a very helpful error and could be caused by all kinds of different things. The limited fix depends on what the error is.

For example, it could be > that wasn't encoded where it should have been. The parser thinks it starts an element, and but has a space and normal text around.

Or it could have been a non-ASCII in a name. This is allowed in XML 1.1, but that is uncommon. Stripping out the high-bit characters would do help if you don't mind losing all non-ASCII characters.

Your regex is wrong, although I am assuming perl monks is mangling the content and putting a weird character. s/\W// removes all non-word characters. Word characters are alphanumeric plus '_', basically identifiers. That regex will remove all markup and whitespace. Not good.