in reply to Am I hitting a Perl XML parser module internal bug when dealing with large amounts of data?

The documentation for XML::Parser says for the Char handler:

This event is generated when non-markup is recognized. The non-markup sequence of characters is in String. A single non-markup sequence of characters may generate multiple calls to this handler. Whatever the encoding of the string in the original document, this is given to the handler in UTF-8.

... which sounds like the thing you're experiencing.

Most likely, you want to accumulate data in your Char handler and flush it in your End handler and your Start handler.

  • Comment on Re: Am I hitting a Perl XML parser module internal bug when dealing with large amounts of data?
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: Am I hitting a Perl XML parser module internal bug when dealing with large amounts of data?
by feiiiiiiiiiii (Acolyte) on Sep 22, 2014 at 16:48 UTC
    Got it. Thanks!