Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Yet Another UTF8 Issue

by stefan k (Curate)
on Mar 18, 2004 at 08:47 UTC ( [id://337640]=perlquestion: print w/replies, xml ) Need Help??

stefan k has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

I've been working on a RedHat 9.0 system for quite a while until I figured that removing the UTF-8 things in /etc/sysconfig/i18n cures many of my itchings.

After that I suddenly get an error:

Wide character in print at ./perlskript line 43.
That script uses XML::XPath, parses an XML-file (which does not contain any umlauts or funny stuff), uses the information stored there to fill out a form defined in a second (template-)file and prints the output. This second file contains one umlaut in ISO-8859-1 encoding. Without this umlaut there is no problem.

Obviously giving the encoding explicitly in the XML file changed nothing.

When I just read in the file and print it again I don't get any error. Even if I setup the parser, parse the XML file and then print the output, nothing happens. Only when applying the information from the XML file to the template file I get the wide-character-warning. I also tried to set variable EncodeUtf8AsEntity to 1 to no avail.

I'm using perl 5.8.0, XPath Version 1.13.

Any hints, suggestions?

Regards... Stefan
you begin bashing the string with a +42 regexp of confusion

Replies are listed 'Best First'.
Re: Yet Another UTF8 Issue
by eserte (Deacon) on Mar 18, 2004 at 09:17 UTC
    Try first to update to a newer perl (e.g. 5.8.3). There are known issues regarding utf-8 handling in perl 5.8.0, especially in conjunction with RedHat systems.
      Well, I though I got rid of all UTF-8 stuff by deleting it from the /etc/sysconfig/i18n-file. I can easily live with just ISO-8859-1 on my system.

      Regards... Stefan
      you begin bashing the string with a +42 regexp of confusion

Re: Yet Another UTF8 Issue
by bart (Canon) on Mar 18, 2004 at 13:39 UTC
    This second file contains one umlaut in ISO-8859-1 encoding. Without this umlaut there is no problem.
    Is the encoding decleared in the "<?XML ... ?>" header? Without it, any XML parser assumes, as it must, that the encoding is in UTF-8. The complete, proper declaration for your case would be like this:
    <?xml version="1.0" encoding="ISO-8859-1"?>
    See this for an explanation of this issue in beginners' terms.
      Yes, just as I mentioned in my question, I tried it. And it did nothing to make the warning go away.

      Regards... Stefan
      you begin bashing the string with a +42 regexp of confusion

Re: Yet Another UTF8 Issue
by Arrowhead (Monk) on Mar 18, 2004 at 17:40 UTC

    If you're using XML, you will have to deal with unicode.

    Pretending you don't will only make things harder.

    All XML parsing modules for perl will give your perl code utf-8 encoded data.

    Use the latest perl 5.8.x you can find, read perluniintro.pod and go from there.

    In your case, the only thing you'll have to do is indicate the encoding that you want your output in, using the 3-argument form of open() or binmode() if you're printing to an already opened filehandle.

      Thanks :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://337640]
Approved by valdez
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-04-18 18:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found