avih has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm having a problem with XML::DOM output when the xml includes panctuated latin characters. utf8 is used throughout my system, foreign languages (Greek, Chinese, Russian etc..) are displayed well. Below is a one-liner I created so I can send some code for you guys to test, however in the one-liner it works just fine. Inside my program it messes up the unicode chars

use XML::DOM; my $doc = XML::DOM::Document->new; my $decl = $doc->createXMLDecl("1.0", "utf-8"); $doc->setXMLDecl($decl); $tag = $doc->createElement("word"); $tag->addText("IssuéTést"); $doc->appendChild($tag); print $doc->toString();

I get the following correct xml:

<?xml version="1.0" encoding="utf-8" ?> <word>Issu&#233;T&#233;st</word>

However inside get the value Issu㩔st. I need some ideas on how to debug it. Thanks Avihai.

Replies are listed 'Best First'.
Re: XML::DOM panctuated latin char utf8 (demo?)
by ikegami (Patriarch) on Oct 31, 2011 at 08:11 UTC
    Your code doesn't demonstrate the problem. (It doesn't even run.) It would help if it did.
      Created some code to run, however in this code which I run in a one-liner, everything went well. Inside the big program I'm working on, something gets messed up. I need some tips on how to debug this. Thanks, Avihai.