Thanks for the suggestion.
I created a small test spreadsheet with two entries:
Fundación
ФОРСУНОК
The Encoding method returns 1 (8bit ASCII or single byte UTF-16) for the Spanish text and 2 (UTF-16BE) for the Russian text.
I also modified the TextFmt routine in FmtDefault.pm to print the value of the parameter $sCode. It was undef for the Spanish text and UTF16-BE for the Russian text. So the routine just returns the Spanish text since $sCode is undef, but formats the Russian text (which gets mangled) as UTF16-BE.
sub TextFmt($$;$) { my($oThis, $sTxt, $sCode) =@_; if((! defined($sCode)) || ($sCode eq '_native_')) { print STDERR "$sTxt/sCode " . (defined($sCode) ? "is _native_" + : "undefined") . " - returning text\n"; return $sTxt; }; # Handle utf8 strings in newer perls. if ($] >= 5.008) { require Encode; print STDERR "$sTxt/$sCode; returning text with UTF-16BE encod +ing\n"; return Encode::decode("UTF-16BE", $sTxt); } print STDERR "$sTxt/$sCode; formatting with pack/unpack\n"; return pack('U*', unpack('n*', $sTxt)); #return pack('C*', unpack('n*', $sTxt)); }
In reply to Re^2: Handling variety of languages/Unicode characters with Spreadsheet::ParseExcel
by richb
in thread Handling variety of languages/Unicode characters with Spreadsheet::ParseExcel
by richb
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |