in reply to Re^2: Using a variable with UTF8 content coming from XPATH findvalue
in thread Using a variable with UTF8 content coming from XPATH findvalue

I'll look at your code in more detail soon (though I worry about the fact that it seems to use some sort of local library that I might not be able to get from CPAN...)

In the meantime (just for grins, as we say ;), you might try running your sample XML file through this tool that I posted a while back: tlu -- TransLiterate Unicode

If the file really does contain any utf8 Russian character(s), a command line like this will tell you the exact unicode code point(s) and character name(s):

tlu -o uf test.xml | grep CYRILLIC
If there are Russian characters in your file, but they aren't really utf8-encoded (trust me, it happens!), then "tlu" will either report errors or else spit out "FFFD ... REPLACEMENT CHARACTER", and that is most likely the source of all your trouble -- you would need to convert the data from ... (whatever encoding it really is) into true and valid utf8.
  • Comment on Re^3: Using a variable with UTF8 content coming from XPATH findvalue
  • Download Code

Replies are listed 'Best First'.
Re^4: Using a variable with UTF8 content coming from XPATH findvalue
by inguanzo (Acolyte) on Sep 27, 2007 at 22:40 UTC
    Hi,
    I'm not using any local library, all of the XML ones are coming from CPAN. About the right encoding, It may not be a problem, since the other value that is not replaced is having the right representation. http://www.losinguanzo.com/utf8/test_o.xml I printed the value twice to be sure the problem was before the XML sustitution Running your script (Thanks ! Its really cool) :
    bash-3.2$ perl tlu.pl -o uf russian_test.xml | grep CYRILLIC 0414 Д CYRILLIC CAPITAL LETTER DE 0430 а CYRILLIC SMALL LETTER A bash-3.2$
    Thanks in advance for all your help Mauricio Inguanzo