in reply to Re^2: UTF in Perl
in thread UTF in Perl

my code(snippet) looks like this
use XML::SMART;

Where did you get XML::SMART from? I can't find that on cpan, only XML::Smart.

The XML seems to be valid with utf8 encoding.

Funny, when I run your script it prints the ä in Latin-1, which is a bug in the module (IMHO).

This works for me (ie produces a valid utf-8 XML file) (source file stored in UTF-8):

use XML::Smart; use Encode; my $XML = XML::Smart->new(q`<?xml version="1.0" encoding="UTF-8" ?> <MSR-ISSUE xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instan +ce" xsi:noNamespaceSchemaLocation="my.xsd"> </MSR-ISSUE>`, 'XML::Smart::Parser'); my $test_in_incoming_xml = "Auch wenn man es nach Jahren guter Beschäf +tigung kaum verstehen kann"; utf8::upgrade($test_in_incoming_xml); $XML->{'MSR-ISSUE'}{'SHORT-NAME'}->content(0,$test_in_incoming_xml); my $xmlfile = "foo.xml"; $XML->save($xmlfile, nometagen => 1, forceutf8 => 1);

I'm pretty sure that the utf8::upgrade line is fundamentally wrong, and compensates for a XML::Smart bug.

Replies are listed 'Best First'.
Re^4: UTF in Perl
by KarthikK (Sexton) on Oct 10, 2008 at 14:56 UTC
    Thanks Moritz!
    XML::Smart and XML::SMART both works! no idea how! this is the same module you had referred.
    I am completely lost here :-(
    Basically this is what i get:

    1. I get a XML which is un utf8 format. IT will have all sorts of special characters but utf8 encoded.
    2. I will have to get these values convert them back to windows-1256 or 1252-MS Windows Latin 1 so that the users see the text properly
    3. I have to export the same back as XML from database in utf8 format.
    Currently i use the UTF8Simple which is buggy :-(
    Somehow this "use encoding" too dosent seem to work!
      XML::Smart and XML::SMART both works!

      ... but only as long as you are on case insensitive file systems. As soon as that changes -> BOOM. So please use the correct spelling.

      I am completely lost here :-(

      It's not an easy topic, mostly because many modules are buggy. But I can't do more than provide you a working example.

      I'll try to give you some general advice though, most of which is already in the article I linked to above.

      • Forget about UTF8Simple. Now.
      • Use a not-so-buggy XML module. XML::LibXML and XML::Twig both have been recommended here multiple times, and I've used both (on very small projects) with success.
      • Your non-buggy XML module will decode all strings on reading, and encode them on writing. So as long as you only deal with decoded text strings, you're done. So make sure that everything that comes from the outside into your program is also decoded. Maybe IO layers (also described in this article might help you with this.
      • Use encode (or IO layers) to present your data to the user
      • Use Devel::Peek to debug your code.

      From the example text I conclude that you speak German. If that's the case I can recommend the #perlde channel on irc.perl.org, that's easier if you have more questions and don't exactly know how to ask them.

        thanks a lot for your reply!
        I cannot use any other module due to few factors like
        • I use ClearQuest perl and I am unable to install any DLL for LibXML to work
        • This is a working code in ClearQuest PERL v 5.6.1 but not in ClearQuest Perl 5.8.1
        • The effort to change the code to another parser is now too much

        I know that XML::Smart is buggy even i treid to contact the author but in vain :-(
        I will try all the possible things ou mentioned. If nothing is working i have no option but to change the XML module
        also i work in Germany now and dealing with data in German lanugage but I dont speak German and hence i cannot ask in perl.de :-(
        Once again thanks a lot for your help. will post back if i have any questions.

        Cheers,
        Karthik