in reply to Invalid XML characters
Are you sure that's what you want to do? I would think that you would rather replace the entities (that's what they are called) by their unicode or latin 1 character.
That can be done by adding a DTD to the file, that contains the proper declaration for the entities. The entity declaration file you want is probably one referenced by the XHTML spec.
This way you don't have to rely of brittle regexps to do the job, the parser will do it for you. If your input is really HTML, which follows SGML syntax, then entities can be trickier to match than you might think, the final semi-colon is optional in certain cases for example. Post a follow-up if you need to know how to add a proper DTD to your XML (with an extract of the beginning of the file you have).
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Invalid XML characters
by Anonymous Monk on Mar 23, 2009 at 10:23 UTC | |
by mirod (Canon) on Mar 23, 2009 at 11:03 UTC |