in reply to Re^5: Slovenian characters problem
in thread Slovenian characters problem

The first call $SheetOut->write(0,0,"Č芚Žž") produces the correct writeout in Excel xls table.

In my script I open a simple Excel table and read its line where the same "Č芚Žž" characters are written, then I write them into Excel xls table:

push(@Test,{Col1=>$Sheet->Cells(1,1)->{'Value'}}) $SheetOut->write(1,0,encode("cp1250",$Test[0]->{Col1}));

Add 1: As a result the script writes the following characters into output table: "Čč" become "بو", "Šš" and "Žž" become four outlined squares. The characters are therefore completely different, I should perform REPLACE manually in Excel (which is out of question: doing that writing script would have no sense). They become question marks only in case if I skip the use encoding "cp1250".

Add 2: In debugger I see @Test[0]->{Col1} which contains the correct characters "Č芚Žž". Data is read from a simple xls Excel table. The next step is writing into another xls Excel table (using Spreadsheet::WriteExcel. The script does nothing to @Test before writing it into table.

I was curious what happens if I add the following four lines:

1 $String="Č芚Žž"; 2 $Test2[0]->{Col1}="Č芚Žž"; ... 3 $SheetOut->write(2,0,$String); 4 $SheetOut->write(3,0,$Test2[0]->{Col1});

The result of all additional lines from 1 to 4 in the output Excel table are correct written characters!

So it seems to me that the only problem occurs by reading Slovenian characters from Excel table (via Win32::OLE). Later I'll check the contents of @Test using Data::Dumper (btw thanks for the tip!) and describe the results.

Replies are listed 'Best First'.
Re^7: Slovenian characters problem
by graff (Chancellor) on Aug 18, 2010 at 08:04 UTC
    So it seems to me that the only problem occurs by reading Slovenian characters from Excel table (via Win32::OLE).

    Since you are using Spreadsheet::WriteExcel and this is working okay on literal strings, you might want to try using Spreadsheet::ParseExcel to read your input Excel files (instead of Win32::OLE). I don't know how different the two approaches are, since I've never used the OLE module, but I've used ParseExcel, and it seems to work fine (on unix and linux, at least). I've had good experiences with it when the excel data included unicode characters, and maybe that's related to what is happening in your case, but I'm not sure.

    Later I'll check the contents of @Test using Data::Dumper

    That is certainly the next thing to try -- the sooner, the better. (BTW, I'm glad you're not doing anonymous posts anymore. Welcome to the Monastery!)

      I've added the following line into script:

      use Data::Dumper;

      After I read the xls table contents here is the second one:

      print "\@Test values are: ", Dumper(\@Test);

      I think I'll have to take your advice and give up on Win32::OLE since Data::Dumper reports this:

      Test values are: $VAR1 = [ { 'Col1' => '\x{00c8}\x{00e8}\x{008a}\x{009a}\x{008e}\x{009e +}' } ]; "\x{00e8}" does not map to cp1250 at C:\batch\Pisarna\Clouseau\Testira +nje.pl line 31. "\x{008a}" does not map to cp1250 at C:\batch\Pisarna\Clouseau\Testira +nje.pl line 31. "\x{009a}" does not map to cp1250 at C:\batch\Pisarna\Clouseau\Testira +nje.pl line 31. "\x{008e}" does not map to cp1250 at C:\batch\Pisarna\Clouseau\Testira +nje.pl line 31. "\x{009e}" does not map to cp1250 at C:\batch\Pisarna\Clouseau\Testira +nje.pl line 31.

      I've started with Win32::OLE and then found that I need to add Spreadsheet::WriteExcel as well because it controls Excel colors and other attributes. I hoped I'll use Tk module - but there are about 150 columns of (wide string) data so Tk is not an option (it's very useful for less data though).

      I will deinitely skip Win32::OLE and check how successfull am I with Spreadsheet::ParseExcel. I'll give notice when I make changes - in case some other poor fellow hits the same wall.

      Thanks for your help!

      P.S.: I forgot to log in earlier...

        As I promised - this is the result of what I've just found out.

        I tried the simplest Spreadsheet::Read to read Excel xls file table:

        @ref = ReadData ("FileToRead.xls");

        I write first line with first three columns into @Test:

        push(@Test,{ A1=>$ref[0]->[1]->{A1}, A2=>$ref[0]->[1]->{A2}, A3=>$ref[0]->[1]->{A3} });

        After I write it into another Excel xls table the Slovenian characters are correct. Hallelujah!

        I'll print the code, back it up and only then I'll smooth the code. Thanks again for your help!