in reply to Re^2: Using a variable with UTF8 content coming from XPATH findvalue
in thread Using a variable with UTF8 content coming from XPATH findvalue
The problem was with the "unpack": I changed the "C*" to "U*", and the data came out fine. Also, if I just comment out that whole "pack(... unpack(...))" line, that also works (at least on my box: macosx with perl 5.8.6).
I know, that seems odd, esp. since the "is_utf8" check reports 0 ("not flagged as utf8") when the pack/unpack line is commented out, and yet the output is definitely valid utf8 Cyrillic. (update: I should also confirm that it reports 1 when using pack('U*',unpack('U*',...));)
(Major mystery of the day: Encode::is_utf8 reports false on a string that comes back from XML::Path, and yet printing it to STDOUT, without doing binmode STDOUT,":utf8" causes a "Wide character in print" warning. Do the binmode setting on STDOUT and the warning goes away. This implies that perl somehow "knows" that it really is a utf8 string, and Encode::is_utf8 seems to be lying or mistaken. So you are a victim of misinformation from a function that, I should point out, is described under the heading "Messing with Perl's Internals" in the docs for Encode. Ugh.)
BTW, the better way to open a file for utf8 output is like this:
open( OUT, ">:utf8", $filename ) or die "$filename: $!\n";
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Using a variable with UTF8 content coming from XPATH findvalue
by inguanzo (Acolyte) on Sep 28, 2007 at 06:13 UTC | |
by graff (Chancellor) on Sep 28, 2007 at 07:02 UTC | |
by inguanzo (Acolyte) on Sep 28, 2007 at 15:59 UTC |