in reply to Re: Handling variety of languages/Unicode characters with Spreadsheet::ParseExcel
in thread Handling variety of languages/Unicode characters with Spreadsheet::ParseExcel
Thanks for the suggestion.
I created a small test spreadsheet with two entries:
Fundación
ФОРСУНОК
The Encoding method returns 1 (8bit ASCII or single byte UTF-16) for the Spanish text and 2 (UTF-16BE) for the Russian text.
I also modified the TextFmt routine in FmtDefault.pm to print the value of the parameter $sCode. It was undef for the Spanish text and UTF16-BE for the Russian text. So the routine just returns the Spanish text since $sCode is undef, but formats the Russian text (which gets mangled) as UTF16-BE.
sub TextFmt($$;$) { my($oThis, $sTxt, $sCode) =@_; if((! defined($sCode)) || ($sCode eq '_native_')) { print STDERR "$sTxt/sCode " . (defined($sCode) ? "is _native_" + : "undefined") . " - returning text\n"; return $sTxt; }; # Handle utf8 strings in newer perls. if ($] >= 5.008) { require Encode; print STDERR "$sTxt/$sCode; returning text with UTF-16BE encod +ing\n"; return Encode::decode("UTF-16BE", $sTxt); } print STDERR "$sTxt/$sCode; formatting with pack/unpack\n"; return pack('U*', unpack('n*', $sTxt)); #return pack('C*', unpack('n*', $sTxt)); }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Handling variety of languages/Unicode characters with Spreadsheet::ParseExcel
by graff (Chancellor) on Apr 09, 2010 at 13:30 UTC | |
by richb (Scribe) on Apr 09, 2010 at 14:26 UTC | |
by graff (Chancellor) on Apr 09, 2010 at 14:46 UTC |