in reply to Re^2: XML::Parser problems
in thread XML::Parser problems

Sorry to reply with a RTFM, but this is what the FM reads (emphasis added):

Char (Expat, String)
This event is generated when non-markup is recognized. The non-markup sequence of characters is in String. A single non-markup sequence of characters may generate multiple calls to this handler. Whatever the encoding of the string in the original document, this is given to the handler in UTF-8.

Note that AFAIK all XML parsers behave like this, to allow you to parse documents even if they contain chunks of texts are bigger than the available memory.

Also the XML::Parser review mentions this, and give you a way to get all the data.

Update: the Perl XML FAQ also mentions this.

Replies are listed 'Best First'.
Re^4: XML::Parser problems
by Hena (Friar) on Jul 01, 2005 at 08:21 UTC
    No need to say sorry. If it is RTFM then it is RTFM. As I do seem to be missing something :). I infact was doing similar combining of string here myself by now (as a way to get around problem), which was mentioned in that review link.

    I quess this gets marked to things, we live and learn.