in reply to Re: Re: XML file won't parse properly
in thread XML file won't parse properly

OK, sorry, we must have cross-posted, because this wasn't listed when I initially replied.

It does indeed appear that you have a possibly mal-formed XML file.

I should point out that those are probably not ASCII characters, unless the document specifically states such in the initial string... <?xml version="1.0" encoding="ISO-8859-1"?> AFAIK, it's usually UTF-8...

I would most certainly check with the source of your data, since it's possible the file is corrupt... Also, if this is common, they should be having probelms with whomever they're sending these files to.

With regards to pre-filtering, you want to be VERY careful with this. Isolate ONLY those characters that are causing the parser to barf & 1)try escaping them, 2)try commenting them out, and only if that doesn't work then 3)try replacing them with whitespace.

But since this is seemingly a question of mal-formedness, none of those approaches are sure to work...



Wait! This isn't a Parachute, this is a Backpack!

Replies are listed 'Best First'.
Re: Re: Re: Re: XML file won't parse properly
by brpsss (Sexton) on Apr 12, 2001 at 21:48 UTC

    gregor, I think its not well formedness in that start tags not equal to end tags or anything...

    In the code that I posted below, I set error context, and I get the place where the error is supposed to occur.. and this is inside the CDATA section, or in the embedded text...

    I also don't think its (the file) is corrupt, because I extract it from a zip, and the unzip gives no errors..Unfortunately, the header only says <xml version="1.0"> no news of what encoding it is...

    The only option that I see is to do a search and replace to strip out the control characters before I parse.. I really don't see any other choice, but I'm willing to listen to anyone who tells me this is too drastic :)

    Thanks