First, a quick sanity check: why does your xml data contain U0097 (a.k.a &#151) -- according to the code chart, this is a non-displayable control character, whose name/function is labeled in the chart as "END OF GUARDED AREA". Is that what you intend/expect it to be?

(If you were expecting it to be some displayable character, then either you have the wrong code point in your data, or else you're saying/pretending it's unicode when in fact it is not. BTW, I notice that 0x97 is used in the MS "CP125*" code pages for "em dash", which is "officially" supposed to transliterate into U2014, which in turn should yield a 3-byte utf8 sequence: E2 80 94.)

I tried the test script that you posted in a reply above, and it seemed to put a U0097 character -- in utf8 encoding (i.e. as the two-byte sequence C2 97) -- for both "test1" and "test2" elements, in all of its outputs (the "print_out.xml" file, the "out.xml" file, and STDOUT; of course, I had to use a hex dump to actually "see" the character in all cases, since it is not displayable). Does that run contrary to your own findings?

(I'm running 5.8.1 on darwin. 5.8.5 shouldn't be any different...)


In reply to Re: problem with XML::Writer, unicode and Perl 5.6.0 upgrade by graff
in thread problem with XML::Writer, unicode and Perl 5.6.0 upgrade by santellij

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.