in reply to Re^2: Using a variable with UTF8 content coming from XPATH findvalue
in thread Using a variable with UTF8 content coming from XPATH findvalue

I edited your sample data file to put in the real Cyrillic characters that you cited, edited the script to get the shebang line right for my machine, and ran it. I saw the problem that you were describing.

The problem was with the "unpack": I changed the "C*" to "U*", and the data came out fine. Also, if I just comment out that whole "pack(... unpack(...))" line, that also works (at least on my box: macosx with perl 5.8.6).

I know, that seems odd, esp. since the "is_utf8" check reports 0 ("not flagged as utf8") when the pack/unpack line is commented out, and yet the output is definitely valid utf8 Cyrillic. (update: I should also confirm that it reports 1 when using pack('U*',unpack('U*',...));)

(Major mystery of the day: Encode::is_utf8 reports false on a string that comes back from XML::Path, and yet printing it to STDOUT, without doing binmode STDOUT,":utf8" causes a "Wide character in print" warning. Do the binmode setting on STDOUT and the warning goes away. This implies that perl somehow "knows" that it really is a utf8 string, and Encode::is_utf8 seems to be lying or mistaken. So you are a victim of misinformation from a function that, I should point out, is described under the heading "Messing with Perl's Internals" in the docs for Encode. Ugh.)

BTW, the better way to open a file for utf8 output is like this:

open( OUT, ">:utf8", $filename ) or die "$filename: $!\n";

Replies are listed 'Best First'.
Re^4: Using a variable with UTF8 content coming from XPATH findvalue
by inguanzo (Acolyte) on Sep 28, 2007 at 06:13 UTC
    Hi,
    I forgot to test this on other OS. You are right, this script works very good without any care on UTF8. I just tried on a :

    WORKS!:::::::::::::::::::::::::::::::::
    Windows XP Perl v5.8.8

    WORKS!:::::::::::::::::::::::::::::::::
    Test performed in a Linux I have at home: RedHat Kernel Version 2.4 Perl 5.8.0

    FAIL!::::::::::::::::::::::::::::::::: TEST AT WORK: Suse Kernel Version 2.6 Perl 5.8.0

    Thanks for the help.
    Inguanzo
      Beware of 5.8.0 in general, and especially on Redhat. It's nice (lucky?) that it works in this case, but I recommend you upgrade that machine soon if it's going to play any sort of important role in your development or usage of unicode-relevant scripts.
        Thanks for the advice, I'll request for the upgrade in the system, if not posible I'll run my own distribution from my home directory. You rule my friend, Thanks for your help. Regards !
        Inguanzo