Your fix caused the stuff after newlines to get lost so I tried to fix the fix.

In the first attempt I tried to replace the code on lines 1731-1733 by

$t->{twig_chunk_number} = 0 if !defined($elt->{cdata}); $elt->{cdata}.= $t->{twig_stored_spaces}.$string unless( $t->{twig_keep_encoding} && defined($elt->{cdata}) & +& length($elt->{cdata})>1024 && ++$t->{twig_chunk_number}==1) ; # fix +es a bug in XML::Parser for long CDATA
This helped somewhat, the script in the root node went fine, the text was complete. The problem was that as soon as I copied the long line another time within the tag, the original problem reappeared. The last several characters of the second line appeared twice.

I'll try something more, but I think the place this should be fixed is XML::Parser, not XML::Twig.

UPDATE: Looks like this works. You'll probably want to tweak it to fit better into the module:

$elt->{cdata}.= $t->{twig_stored_spaces}.$string unless $t->{twig_skip_next_chunk}; # fixes a bug in XML::Par +ser for long CDATA if ( $t->{twig_keep_encoding} && defined($string) && length($s +tring)>1024) { $t->{twig_skip_next_chunk} = 1; } else { $t->{twig_skip_next_chunk} = 0; }
Looks to me like we need to remember whether the last chunk was larger than 1024.

All tests that ran on my computer passed.

Jenda
XML sucks. Badly. SOAP on the other hand is the most powerfull vacuum pump ever invented.


In reply to Re^2: Dangerous XML::Twig (or XML::Parser?) bug. Long text is read incorrectly! by Jenda
in thread Dangerous XML::Twig (or XML::Parser?) bug. Long text is read incorrectly! by Jenda

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.