Yes, I had a fix like this... but it was completely broken when I tested it with ISO-8859-1 extended characters (aka 'é'), the 1024 figure was not right. So I added a few layers of complex calculations, cursed a lot, wrote more tests directly on XML::Parser... until I realized that the solution was a lot simpler: for CDATA sections, the string passed to the character handler is in the original encoding (I had been staring at such strings for over an hour when it hit me!). So there was no need to do all this, just to use the usual string within a CDATA section... et voilà!

Thanks for looking into it though.


In reply to Re^3: Dangerous XML::Twig (or XML::Parser?) bug. Long text is read incorrectly! by mirod
in thread Dangerous XML::Twig (or XML::Parser?) bug. Long text is read incorrectly! by Jenda

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.