Yet Another UTF8 Issue

stefan k has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

I've been working on a RedHat 9.0 system for quite a while until I figured that removing the UTF-8 things in /etc/sysconfig/i18n cures many of my itchings.

After that I suddenly get an error:

Wide character in print at ./perlskript line 43.
[download]

That script uses XML::XPath, parses an XML-file (which does not contain any umlauts or funny stuff), uses the information stored there to fill out a form defined in a second (template-)file and prints the output. This second file contains one umlaut in ISO-8859-1 encoding. Without this umlaut there is no problem.

Obviously giving the encoding explicitly in the XML file changed nothing.

When I just read in the file and print it again I don't get any error. Even if I setup the parser, parse the XML file and then print the output, nothing happens. Only when applying the information from the XML file to the template file I get the wide-character-warning. I also tried to set variable EncodeUtf8AsEntity to 1 to no avail.

I'm using perl 5.8.0, XPath Version 1.13.

Any hints, suggestions?

Regards... Stefan

you begin bashing the string with a +42 regexp of confusion

Comment on Yet Another UTF8 Issue Select or Download Code

Replies are listed 'Best First'.

Re: Yet Another UTF8 Issue
by eserte (Deacon) on Mar 18, 2004 at 09:17 UTC

Try first to update to a newer perl (e.g. 5.8.3). There are known issues regarding utf-8 handling in perl 5.8.0, especially in conjunction with RedHat systems.

[reply]

Re: Re: Yet Another UTF8 Issue

by stefan k (Curate) on Mar 18, 2004 at 10:43 UTC

all

/etc/sysconfig/i18n

Regards... Stefan

you begin bashing the string with a +42 regexp of confusion

[reply]
[d/l]

Re: Yet Another UTF8 Issue
by bart (Canon) on Mar 18, 2004 at 13:39 UTC

This second file contains one umlaut in ISO-8859-1 encoding. Without this umlaut there is no problem.

<?XML ... ?>

<?xml version="1.0" encoding="ISO-8859-1"?>
[download]

this

[reply]
[d/l]
[select]

Re: Re: Yet Another UTF8 Issue

by stefan k (Curate) on Mar 18, 2004 at 14:05 UTC

Regards... Stefan

you begin bashing the string with a +42 regexp of confusion

[reply]

Re: Yet Another UTF8 Issue
by Arrowhead (Monk) on Mar 18, 2004 at 17:40 UTC

If you're using XML, you will have to deal with unicode.

Pretending you don't will only make things harder.

All XML parsing modules for perl will give your perl code utf-8 encoded data.

Use the latest perl 5.8.x you can find, read perluniintro.pod and go from there.

In your case, the only thing you'll have to do is indicate the encoding that you want your output in, using the 3-argument form of open() or binmode() if you're printing to an already opened filehandle.


Just another Perl shrine
	PerlMonks