in reply to Re: Slovenian characters problem
in thread Slovenian characters problem

Thanks for your hint, unfortunatelly my interpretation of it wasn't succesful.

I apologize, I'm new to this editor so the code was "lost".

#!/usr/bin/perl
use strict;
use Pod::Usage;
use Getopt::Std;
use Win32::OLE;
use Spreadsheet::WriteExcel;
use encoding "cp1250";
use Encode;

my $Excel;
my $Book;
my $Sheet;
my @Test;
my $Test;

@Test=encode("cp1250",@Test);

#Excel file FileToRead.xls contains the following Slovenian characters:
# Line 1 Column 1: Čč (capital and small letter C with caron)
# Line 1 Column 2: Šš (capital and small letter S with caron)
# Line 1 Column 3: Žž (capital and small letter Z with caron)

$Excel=Win32::OLE->GetActiveObject('Excel.Application') || Win32::OLE->new('Excel.Application', 'Quit');
$Book=$Excel->Workbooks->Open('c:\batch\Pisarna\Clouseau\FileToRead.xls');
$Sheet=$Book->Worksheets(1);

# characters read into @Test are correct push(@Test,{
Col1=>$Sheet->Cells(1,1)->{'Value'},
Col2=>$Sheet->Cells(1,2)->{'Value'},
Col3=>$Sheet->Cells(1,3)->{'Value'},
});

$Book->Close;
$Excel->Close;

my $BookOut=Spreadsheet::WriteExcel->new('c:\batch\Pisarna\Clouseau\FileWritten.xls');
my $SheetOut=$BookOut->add_worksheet('test');

# writing Slovenian characters directly
$SheetOut->write(0,0,"Čč");
$SheetOut->write(0,1,"Šš");
$SheetOut->write(0,2,"Žž");

# read from xls file through @Test
$SheetOut->write(1,0,"$Test1->{Col1}");
$SheetOut->write(1,1,$Test1->{Col2});
$SheetOut->write(1,2,$Test1->{Col3});

# strings defined in above lines 41, 42 and 43 are written into FileWritten.xls correct:
# Line 1 Column 1: Čč (capital and small letter C with caron)
# Line 1 Column 2: Šš (capital and small letter S with caron)
# Line 1 Column 3: Žž (capital and small letter Z with caron)

# strings read into @Test are written into FileWritten.xls incorrect
# Line 2 Column 1: capital and small letter C with grave (can't be written)
# Line 2 Column 2: small outlined square (can't be written)
# Line 2 Column 3: small outlined square (can't be written)

_END_

In short, Slovenian characters written directly are correct but those read from xls file and written through @Test aren't correct. Something weird happens to them on their way from array through Spreadsheet::WriteExcel.

(I work in Windows XP Professional version 2002, I use Perl 5.8). Thanks for your hint.

Replies are listed 'Best First'.
Re^3: Slovenian characters problem
by graff (Chancellor) on Aug 14, 2010 at 05:15 UTC
    When you post perl code at the Monastery, put <code> at the beginning, and put </code> at the end; this will "do the right thing" to make your code look right in the post, without having to do anything else special to the text of the perl code itself. This is explained at Markup in the Monastery (and this link is provided on the page where you submit your post).

    Regarding your code, this first use of "decode()" does nothing, because there is nothing in the array yet:

    ... my @Test; my $Test; @Test=encode("cp1250",@Test); ...
    So you must have misunderstood what I meant. Anyway, since your script includes use encoding "cp1250"; I gather that you wrote your script with a text editor that saves the file in that encoding. That should be fine, but it means that the quoted strings with accented characters are being treated internally in perl as utf8 strings (because that's what use encoding is supposed to do -- read the output of perldoc encoding).

    So if you want these characters to be stored in the Excel file as cp1250 characters, I think you need to do your "write" calls like this:

    use Encode; ... $SheetOut->write(0,0, encode( "cp1250", "&#268;&#269;" ); $SheetOut->write(0,1, encode( "cp1250", "Šš" ); $SheetOut->write(0,2, encode( "cp1250", "Žž" ); ...
    What happens in that case is: (1) your text editor saves the script as a cp1250-encoded text file, (2) when perl.exe reads the script to execute it, it sees use encoding "cp1250" and converts the special characters to its normal internal utf8 encoding (so that "character semantics" will work in the normal way), (3) then when those "write() functions are called, the Encode::encode function turns the utf8 strings back into cp1250 for storage in the Excel file.

    At least, I think that's what should happen. Give it a try.

    (update: the snippet I posted above is showing numeric character entities for some of the characters -- that was not intentional, but I'm not going to try to fix it -- you know which characters are supposed to be there.)

      Thanks for the "code" suggestion.

      I forgot to delete the contents of line 16 in the script before posting it. I apologize; without replacing [0] with 1 it wouldn't even write lines read from array.

      My text editor is set to "Central European (cp1250)": when I open the script for example in Notepad++, MS Word and MS Excel the Slovenian characters are written correct.

      I tried what you suggested:

      $SheetOut->write(0,0,"&#268;&#269;ŠšŽž"); $SheetOut->write(1,0,encode("cp1250",$Test[0]->{Col1}));

      In Xls file the first line is written correct, the second one not.

      Why are characters defined as string (line 1) correct and those read from array (line 2) not correct?

      Are there two or more different possible # characters in#!/usr/bin/perl and by Murphy I use the wrong one or something? Just kidding. :/

        Based on your description, it seems to make sense that the first $SheetOut->write() works as intended, because your script has the literal string stored in the encoding that works for your usage of Excel.

        Regarding the second call to $SheetOut->write(), there are two issues you need to consider (and describe for us, if you still need us to help):

        1. In what way is the result not correct? What do you actually get in place of the data you wanted to get? Is it partially bad, or completely bad? Do you see one or more question marks? Do you see nonsense characters?

        2. What sort of data is actually stored in that element of the "$Test" AoH, which you want to put into your spreadsheet? Where does it come from? What other steps in your code have had an effect on that item of data before you pass it to $SheetOut->write() ?

        For the second issue, it might suffice just to print out the contents of $Test[0]->{Col1} in some explicit way -- probably the best bet would be Data::Dumper...

        use Data::Dumper 'Dumper'; ... print Dumper( $Test[0] ); # display contents of anon.hash, including +{Col1} ...
        Based on what that shows, it might be fairly simple to figure out the cause of the trouble.