in reply to Re^2: Slovenian characters problem
in thread Slovenian characters problem

When you post perl code at the Monastery, put <code> at the beginning, and put </code> at the end; this will "do the right thing" to make your code look right in the post, without having to do anything else special to the text of the perl code itself. This is explained at Markup in the Monastery (and this link is provided on the page where you submit your post).

Regarding your code, this first use of "decode()" does nothing, because there is nothing in the array yet:

... my @Test; my $Test; @Test=encode("cp1250",@Test); ...
So you must have misunderstood what I meant. Anyway, since your script includes use encoding "cp1250"; I gather that you wrote your script with a text editor that saves the file in that encoding. That should be fine, but it means that the quoted strings with accented characters are being treated internally in perl as utf8 strings (because that's what use encoding is supposed to do -- read the output of perldoc encoding).

So if you want these characters to be stored in the Excel file as cp1250 characters, I think you need to do your "write" calls like this:

use Encode; ... $SheetOut->write(0,0, encode( "cp1250", "&#268;&#269;" ); $SheetOut->write(0,1, encode( "cp1250", "Šš" ); $SheetOut->write(0,2, encode( "cp1250", "Žž" ); ...
What happens in that case is: (1) your text editor saves the script as a cp1250-encoded text file, (2) when perl.exe reads the script to execute it, it sees use encoding "cp1250" and converts the special characters to its normal internal utf8 encoding (so that "character semantics" will work in the normal way), (3) then when those "write() functions are called, the Encode::encode function turns the utf8 strings back into cp1250 for storage in the Excel file.

At least, I think that's what should happen. Give it a try.

(update: the snippet I posted above is showing numeric character entities for some of the characters -- that was not intentional, but I'm not going to try to fix it -- you know which characters are supposed to be there.)

Replies are listed 'Best First'.
Re^4: Slovenian characters problem
by Anonymous Monk on Aug 16, 2010 at 06:56 UTC

    Thanks for the "code" suggestion.

    I forgot to delete the contents of line 16 in the script before posting it. I apologize; without replacing [0] with 1 it wouldn't even write lines read from array.

    My text editor is set to "Central European (cp1250)": when I open the script for example in Notepad++, MS Word and MS Excel the Slovenian characters are written correct.

    I tried what you suggested:

    $SheetOut->write(0,0,"&#268;&#269;ŠšŽž"); $SheetOut->write(1,0,encode("cp1250",$Test[0]->{Col1}));

    In Xls file the first line is written correct, the second one not.

    Why are characters defined as string (line 1) correct and those read from array (line 2) not correct?

    Are there two or more different possible # characters in#!/usr/bin/perl and by Murphy I use the wrong one or something? Just kidding. :/

      Based on your description, it seems to make sense that the first $SheetOut->write() works as intended, because your script has the literal string stored in the encoding that works for your usage of Excel.

      Regarding the second call to $SheetOut->write(), there are two issues you need to consider (and describe for us, if you still need us to help):

      1. In what way is the result not correct? What do you actually get in place of the data you wanted to get? Is it partially bad, or completely bad? Do you see one or more question marks? Do you see nonsense characters?

      2. What sort of data is actually stored in that element of the "$Test" AoH, which you want to put into your spreadsheet? Where does it come from? What other steps in your code have had an effect on that item of data before you pass it to $SheetOut->write() ?

      For the second issue, it might suffice just to print out the contents of $Test[0]->{Col1} in some explicit way -- probably the best bet would be Data::Dumper...

      use Data::Dumper 'Dumper'; ... print Dumper( $Test[0] ); # display contents of anon.hash, including +{Col1} ...
      Based on what that shows, it might be fairly simple to figure out the cause of the trouble.

        The first call $SheetOut->write(0,0,"&#268;&#269;ŠšŽž") produces the correct writeout in Excel xls table.

        In my script I open a simple Excel table and read its line where the same "Č芚Žž" characters are written, then I write them into Excel xls table:

        push(@Test,{Col1=>$Sheet->Cells(1,1)->{'Value'}}) $SheetOut->write(1,0,encode("cp1250",$Test[0]->{Col1}));

        Add 1: As a result the script writes the following characters into output table: "Čč" become "بو", "Šš" and "Žž" become four outlined squares. The characters are therefore completely different, I should perform REPLACE manually in Excel (which is out of question: doing that writing script would have no sense). They become question marks only in case if I skip the use encoding "cp1250".

        Add 2: In debugger I see @Test[0]->{Col1} which contains the correct characters "Č芚Žž". Data is read from a simple xls Excel table. The next step is writing into another xls Excel table (using Spreadsheet::WriteExcel. The script does nothing to @Test before writing it into table.

        I was curious what happens if I add the following four lines:

        1 $String="&#268;&#269;ŠšŽž"; 2 $Test2[0]->{Col1}="&#268;&#269;ŠšŽž"; ... 3 $SheetOut->write(2,0,$String); 4 $SheetOut->write(3,0,$Test2[0]->{Col1});

        The result of all additional lines from 1 to 4 in the output Excel table are correct written characters!

        So it seems to me that the only problem occurs by reading Slovenian characters from Excel table (via Win32::OLE). Later I'll check the contents of @Test using Data::Dumper (btw thanks for the tip!) and describe the results.