in reply to Re^2: XML Parser not well-formed
in thread XML Parser not well-formed

I have actually got around the problem by processing the file manually beforing loading it up with XML::Parser, and removing the dodgy characters.

That's one way to do it. You could probably figure out in which encoding they are and replace them by the proper utf-8 character. My guess is that some of the text , like the DESCRIPTION is entered through either a web form or word processor, it shoul be possible to find out what encoding is used.

Replies are listed 'Best First'.
Re^4: XML Parser not well-formed
by ktingle (Sexton) on Nov 02, 2004 at 20:01 UTC
    That character is 0x92, UTF-8 only maps up to 0x7F as a single byte. If the document is representing that character with just one byte then its not UTF-8 and a broken XML instance. That character is represented with 2 bytes in UTF-8.

    Whenever I get confused about UTF-8 I use this reference;

    http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8