in reply to Forcing XML to validate
For example, it could be > that wasn't encoded where it should have been. The parser thinks it starts an element, and but has a space and normal text around.
Or it could have been a non-ASCII in a name. This is allowed in XML 1.1, but that is uncommon. Stripping out the high-bit characters would do help if you don't mind losing all non-ASCII characters.
Your regex is wrong, although I am assuming perl monks is mangling the content and putting a weird character. s/\W// removes all non-word characters. Word characters are alphanumeric plus '_', basically identifiers. That regex will remove all markup and whitespace. Not good.
|
|---|