in reply to Possible HTML::TokeParser::Simple Bug

Sure, it's processing the string in 512-byte chunks. See this paragraph in HTML::Parser
$p->unbroken_text( $bool )
By default, blocks of text are given to the text handler as soon as possible (but the parser makes sure to always break text at the boundary between whitespace and non-whitespace so single words and entities always can be decoded safely). This might create breaks that make it hard to do transformations on the text. When this attribute is enabled, blocks of text are always reported in one piece. This will delay the text event until the following (non-text) event has been recognized by the parser.
  • Comment on Re: Possible HTML::TokeParser::Simple Bug