in reply to WriteExcel unicode question


The Spreadsheet::WriteExcel write() method will handle Unicode strings if they have the utf8 flag set.

This is generally the case if you read in a file and specify the encoding of the file when you open it. See for example the unicode_*.pl examples in the S::WE distro and in particular unicode_8859_7.pl for Greek text.

So, I'd guess that the strings that you are having problems with don't have the utf8 flag set. You can either fix this using the Encode module or if you let us know how you are reading this strings then we might be able to suggest a better, or at least alternative, solution.

--
John.

Replies are listed 'Best First'.
Re^2: WriteExcel unicode question
by demis (Novice) on Oct 15, 2009 at 09:27 UTC
    Thanks, but I'm not sure that does it The unicode_8859_7.pl scirpt specifies how one opens an ascii file. In my case the file I get the problematic cells is an excel file itself. I should add it displays fine with the Devel:ptkdb debugging/diagnostic module I use. I use
    $cellvalue2=encode("iso08859-7",$cellvalue2);
    it makes NO difference. I even tried
    if($Cell->{Code} eq 'ucs2'){$cellvalue2=Encode::decode ('UCS-2BE',$cel +lvalue2);
    which gave again junk(though more chinese-looking junk) I read these strings by
    $oexcel=new Spreadsheet::ParseExcel;#version 0.32 from #activestate $oBook=$oexcel->Parse($myfile); for my ($iSheet=0; $iSheet<$oBook->{SheetCount};$iSheet++;} $oWkS=$oBook->{Worksehhet}[$iSheet]; for (my $iR=$oWkS=>{MinRow};defined $oWkS->{MaxRow|&& $iR <=$oWkS->{Ma +xRow};$iR++;){ for(my $iC=$oWkS->{MinCol};defined $oWkS->{MaxCol &&$iC <$oWkS->{MaxCo +l}; $iC++; my $cellvalue2=''; my $oWkC=$oWkS->{Cells}[$iR][$iC]; if($oWk){$cellvalue2=$oWkC->Value; $cellvalue2=$oWkC->{'Val'} if ($cel +lvalue2 eq 'GENERAL';} #and it IS GENERAL ....
    Like I said above, it makes no difference if you try to post-process the $cellvalue2 with  encode("iso-8859-7",$cellvalue2);

      Okay, the part we were missing is that you were reading the data with Spreadsheet::ParseExcel.

      You should be able to read and write from an Excel file without intervention. Here is a working example using the Greek file generated from the example file above:

      #!/usr/bin/perl use strict; use warnings; use Spreadsheet::WriteExcel; use Spreadsheet::ParseExcel; my $parser = Spreadsheet::ParseExcel->new(); my $in_workbook = $parser->Parse('unicode_8859_7.xls'); my $out_workbook = Spreadsheet::WriteExcel->new('newfile.xls'); my $out_worksheet = $out_workbook->add_worksheet(); for my $in_worksheet ( $in_workbook->worksheets() ) { my ( $row_min, $row_max ) = $in_worksheet->row_range(); my ( $col_min, $col_max ) = $in_worksheet->col_range(); for my $row ( $row_min .. $row_max ) { for my $col ( $col_min .. $col_max ) { my $cell = $in_worksheet->get_cell( $row, $col ); next unless $cell; $out_worksheet->write( $row, $col, $cell->value() ); } } }
      The version of Spreadsheet::ParseExcel that you are using, 0.32, is quite old. Try upgrading to the latest, 0.55.

      If that doesn't work you could try specifying an alternative parsing formatter such as S::PE::FmtUnicode or S::PE::FmtJapan (despite the name it also handles general Unicode via Encode):

      ... use Spreadsheet::ParseExcel; use Spreadsheet::ParseExcel::FmtJapan; my $parser = Spreadsheet::ParseExcel->new(); my $formatter = Spreadsheet::ParseExcel::FmtJapan->new(); my $in_workbook = $parser->Parse('unicode_8859_7.xls', $formatter) +; ...
      --
      John.

      $cellvalue2=Encode::decode ('UCS-2BE',$cellvalue2)

      Not sure whether it will solve your problem, but I would have tried 'UCS-2LE'  (I have yet to see a big-endian Windows...)

        almut, it does not. Same chinese-looking characters as with BE