Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

utf8 && XML::Simple

by zakzebrowski (Curate)
on Feb 01, 2005 at 17:07 UTC ( [id://426964]=perlquestion: print w/replies, xml ) Need Help??

zakzebrowski has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,
Can anyone explain why this fails on perl 5.8.3? I tried various utf8 tricks, but I can't get it to work...
Thanks.
Zak
Update: johnnywang++ and borisz++. Write file out as utf8, but read in a file using XML::Simple's XMLin interface!
use XML::Simple qw(:strict); use Encode; # use open 'utf8'; # Can't get this to work - should open all files as + utf8... use Data::Dumper; my $val; $val->{utfchar} = "\x{10a0}"; my $xml = XMLout($val,KeyAttr=>{item=>'name}); open (OUT,">out.xml"); print OUT $xml; close OUT; # Yes, I could use a different slurp funciton... my $readin=""; open (IN,"<out.xml"); while (<IN>){ $readin = $readin . $_; } close IN; my $result = XMLin($readin,KeyAttr=>{item=>'name'},ForceArray=>1); if ($result->{utfchar} eq "\x{10a0}"){ print "Wohoo!\n"; } else { print "Doh!\n"; }


----
Zak - the office

Replies are listed 'Best First'.
Re: utf8 && XML::Simple
by johnnywang (Priest) on Feb 01, 2005 at 18:11 UTC
    Just want to point out (this is not what you're asking) that XMLin can also take a file name as first argument. So instead of:
    # Yes, I could use a different slurp funciton... my $readin=""; open (IN,"<out.xml"); while (<IN>){ $readin = $readin . $_; } close IN; my $result = XMLin($readin,KeyAttr=>{item=>'name'},ForceArray=>1);
    you can just say:
    my $result = XMLin("out.xml",KeyAttr=>{item=>'name'},ForceArray=>1);
      ++ ++ !! This technique works. It looks like XML::Simple will *automatically* read in a file as utf8. So, you must explicilty write a file as utf8, and just use the XMLin method explicitly to read the file... Thanks! Zak


      ----
      Zak - the office
      I have the same problem as zakrebrowski. When I use his example I still do not get the characters right:
      #!/usr/bin/perl use XML::Simple; use Data::Dumper; use Encode; my $content = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n"; $content .= "<tag>\x{c3}\x{bb}</tag>\n"; print "input:\n$content\n"; my $xml = new XML::Simple; my $data = $xml->XMLin($content, KeepRoot => 1); encode_utf8($data->{'tag'}); print "data: ".$data->{'tag'}."\n"; print Dumper $data;
      returns:
      input: <?xml version="1.0" encoding="UTF-8" ?> <tag>û</tag> data: $VAR1 = { 'tag' => "\x{fb}" };

      My real life code tries to parse an xml with xml::Simple and stores the data in a mysql-database. The database has the same encoding problems as above.

      I am looking at this sample code for days now with no idea where to go on ... Any help is appreciated!

        The code appears correct, because 00FB is Latin Small Letter U With Circumflex. So the next steps would be to check how the data gets stored in MySQL, how you retrieve the data and how you then display the data.

Re: utf8 && XML::Simple
by borisz (Canon) on Feb 01, 2005 at 17:14 UTC
    What about use open ':utf8';?
    Wohoo! for me.
    Boris
      Thanks for replying. In my (much more complicated real life) file, I get "Cannot decode string with wide characters at /usr/../Encode.pm line 184."


      ----
      Zak - the office
        most likely your input data is arady in utf8 and you want to convert it a second time to utf8. for example:
        use Encode; my $str = "hi"; # hi is a notmal string. $str .= chr(0x1234); # str is now a utf8 string Encode::decode_utf8($str, 1); # here you get the error.
        Propably you exchange encode and decode. Or the decode function call is not needed in your case.
        Boris

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://426964]
Approved by bart
Front-paged by kutsu
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2024-03-28 13:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found