in reply to Re: WriteExcel unicode question
in thread WriteExcel unicode question

Thanks, but I'm not sure that does it The unicode_8859_7.pl scirpt specifies how one opens an ascii file. In my case the file I get the problematic cells is an excel file itself. I should add it displays fine with the Devel:ptkdb debugging/diagnostic module I use. I use
$cellvalue2=encode("iso08859-7",$cellvalue2);
it makes NO difference. I even tried
if($Cell->{Code} eq 'ucs2'){$cellvalue2=Encode::decode ('UCS-2BE',$cel +lvalue2);
which gave again junk(though more chinese-looking junk) I read these strings by
$oexcel=new Spreadsheet::ParseExcel;#version 0.32 from #activestate $oBook=$oexcel->Parse($myfile); for my ($iSheet=0; $iSheet<$oBook->{SheetCount};$iSheet++;} $oWkS=$oBook->{Worksehhet}[$iSheet]; for (my $iR=$oWkS=>{MinRow};defined $oWkS->{MaxRow|&& $iR <=$oWkS->{Ma +xRow};$iR++;){ for(my $iC=$oWkS->{MinCol};defined $oWkS->{MaxCol &&$iC <$oWkS->{MaxCo +l}; $iC++; my $cellvalue2=''; my $oWkC=$oWkS->{Cells}[$iR][$iC]; if($oWk){$cellvalue2=$oWkC->Value; $cellvalue2=$oWkC->{'Val'} if ($cel +lvalue2 eq 'GENERAL';} #and it IS GENERAL ....
Like I said above, it makes no difference if you try to post-process the $cellvalue2 with  encode("iso-8859-7",$cellvalue2);

Replies are listed 'Best First'.
Re^3: WriteExcel unicode question
by jmcnamara (Monsignor) on Oct 15, 2009 at 10:50 UTC

    Okay, the part we were missing is that you were reading the data with Spreadsheet::ParseExcel.

    You should be able to read and write from an Excel file without intervention. Here is a working example using the Greek file generated from the example file above:

    #!/usr/bin/perl use strict; use warnings; use Spreadsheet::WriteExcel; use Spreadsheet::ParseExcel; my $parser = Spreadsheet::ParseExcel->new(); my $in_workbook = $parser->Parse('unicode_8859_7.xls'); my $out_workbook = Spreadsheet::WriteExcel->new('newfile.xls'); my $out_worksheet = $out_workbook->add_worksheet(); for my $in_worksheet ( $in_workbook->worksheets() ) { my ( $row_min, $row_max ) = $in_worksheet->row_range(); my ( $col_min, $col_max ) = $in_worksheet->col_range(); for my $row ( $row_min .. $row_max ) { for my $col ( $col_min .. $col_max ) { my $cell = $in_worksheet->get_cell( $row, $col ); next unless $cell; $out_worksheet->write( $row, $col, $cell->value() ); } } }
    The version of Spreadsheet::ParseExcel that you are using, 0.32, is quite old. Try upgrading to the latest, 0.55.

    If that doesn't work you could try specifying an alternative parsing formatter such as S::PE::FmtUnicode or S::PE::FmtJapan (despite the name it also handles general Unicode via Encode):

    ... use Spreadsheet::ParseExcel; use Spreadsheet::ParseExcel::FmtJapan; my $parser = Spreadsheet::ParseExcel->new(); my $formatter = Spreadsheet::ParseExcel::FmtJapan->new(); my $in_workbook = $parser->Parse('unicode_8859_7.xls', $formatter) +; ...
    --
    John.

Re^3: WriteExcel unicode question
by almut (Canon) on Oct 15, 2009 at 10:36 UTC
    $cellvalue2=Encode::decode ('UCS-2BE',$cellvalue2)

    Not sure whether it will solve your problem, but I would have tried 'UCS-2LE'  (I have yet to see a big-endian Windows...)

      almut, it does not. Same chinese-looking characters as with BE

        To further debug the issue, I would Dump $cellvalue2 using Devel::Peek, in order to see what the encoding actually looks like at the byte-level (see the PV output). Then compare that result against the individual encodings that would make sense in your context, taking into account that some mis-decoding or double encoding may already have happened at some other place in the processing chain...