in reply to Slovenian characters problem

It sounds like most of the tools on your system are assuming a single-byte encoding (cp1250) for Slovenian text. If your perl script is somehow turning your spreadsheet data to utf8 strings, you would need to make sure it gets converted back to cp1250 so that the data will be coherent in the other tools that use (and display) the data.

You haven't shown any code, so I'm not sure what to recommend, but somewhere you're likely going to want to do something like this:

use Encode; ... # assuming $data is a scalar containing a utf8 string: $data = encode( "cp1250", $data ); # now it contains a cp1250 string # (utf8 flag is off, and characters are single-byte)
(updated to show that you have to "use Encode" in order to apply the "decode()" function)

... or maybe something like this:

open( OUT, ">:encoding(cp1250)", "output.txt" ) or die "output.txt: $! +\n"; # assuming $data contains a utf8 string: print OUT $data; # the string gets converted to cp1250 before being written to the file # (no need to "use Encode" in this case)

Replies are listed 'Best First'.
Re^2: Slovenian characters problem
by JackVanRamars (Novice) on Aug 13, 2010 at 05:18 UTC
    Thanks for your hint, unfortunatelly my interpretation of it wasn't succesful.

    I apologize, I'm new to this editor so the code was "lost".

    #!/usr/bin/perl
    use strict;
    use Pod::Usage;
    use Getopt::Std;
    use Win32::OLE;
    use Spreadsheet::WriteExcel;
    use encoding "cp1250";
    use Encode;

    my $Excel;
    my $Book;
    my $Sheet;
    my @Test;
    my $Test;

    @Test=encode("cp1250",@Test);

    #Excel file FileToRead.xls contains the following Slovenian characters:
    # Line 1 Column 1: Čč (capital and small letter C with caron)
    # Line 1 Column 2: Šš (capital and small letter S with caron)
    # Line 1 Column 3: Žž (capital and small letter Z with caron)

    $Excel=Win32::OLE->GetActiveObject('Excel.Application') || Win32::OLE->new('Excel.Application', 'Quit');
    $Book=$Excel->Workbooks->Open('c:\batch\Pisarna\Clouseau\FileToRead.xls');
    $Sheet=$Book->Worksheets(1);

    # characters read into @Test are correct push(@Test,{
    Col1=>$Sheet->Cells(1,1)->{'Value'},
    Col2=>$Sheet->Cells(1,2)->{'Value'},
    Col3=>$Sheet->Cells(1,3)->{'Value'},
    });

    $Book->Close;
    $Excel->Close;

    my $BookOut=Spreadsheet::WriteExcel->new('c:\batch\Pisarna\Clouseau\FileWritten.xls');
    my $SheetOut=$BookOut->add_worksheet('test');

    # writing Slovenian characters directly
    $SheetOut->write(0,0,"Čč");
    $SheetOut->write(0,1,"Šš");
    $SheetOut->write(0,2,"Žž");

    # read from xls file through @Test
    $SheetOut->write(1,0,"$Test1->{Col1}");
    $SheetOut->write(1,1,$Test1->{Col2});
    $SheetOut->write(1,2,$Test1->{Col3});

    # strings defined in above lines 41, 42 and 43 are written into FileWritten.xls correct:
    # Line 1 Column 1: Čč (capital and small letter C with caron)
    # Line 1 Column 2: Šš (capital and small letter S with caron)
    # Line 1 Column 3: Žž (capital and small letter Z with caron)

    # strings read into @Test are written into FileWritten.xls incorrect
    # Line 2 Column 1: capital and small letter C with grave (can't be written)
    # Line 2 Column 2: small outlined square (can't be written)
    # Line 2 Column 3: small outlined square (can't be written)

    _END_

    In short, Slovenian characters written directly are correct but those read from xls file and written through @Test aren't correct. Something weird happens to them on their way from array through Spreadsheet::WriteExcel.

    (I work in Windows XP Professional version 2002, I use Perl 5.8). Thanks for your hint.
      When you post perl code at the Monastery, put <code> at the beginning, and put </code> at the end; this will "do the right thing" to make your code look right in the post, without having to do anything else special to the text of the perl code itself. This is explained at Markup in the Monastery (and this link is provided on the page where you submit your post).

      Regarding your code, this first use of "decode()" does nothing, because there is nothing in the array yet:

      ... my @Test; my $Test; @Test=encode("cp1250",@Test); ...
      So you must have misunderstood what I meant. Anyway, since your script includes use encoding "cp1250"; I gather that you wrote your script with a text editor that saves the file in that encoding. That should be fine, but it means that the quoted strings with accented characters are being treated internally in perl as utf8 strings (because that's what use encoding is supposed to do -- read the output of perldoc encoding).

      So if you want these characters to be stored in the Excel file as cp1250 characters, I think you need to do your "write" calls like this:

      use Encode; ... $SheetOut->write(0,0, encode( "cp1250", "&#268;&#269;" ); $SheetOut->write(0,1, encode( "cp1250", "Šš" ); $SheetOut->write(0,2, encode( "cp1250", "Žž" ); ...
      What happens in that case is: (1) your text editor saves the script as a cp1250-encoded text file, (2) when perl.exe reads the script to execute it, it sees use encoding "cp1250" and converts the special characters to its normal internal utf8 encoding (so that "character semantics" will work in the normal way), (3) then when those "write() functions are called, the Encode::encode function turns the utf8 strings back into cp1250 for storage in the Excel file.

      At least, I think that's what should happen. Give it a try.

      (update: the snippet I posted above is showing numeric character entities for some of the characters -- that was not intentional, but I'm not going to try to fix it -- you know which characters are supposed to be there.)

        Thanks for the "code" suggestion.

        I forgot to delete the contents of line 16 in the script before posting it. I apologize; without replacing [0] with 1 it wouldn't even write lines read from array.

        My text editor is set to "Central European (cp1250)": when I open the script for example in Notepad++, MS Word and MS Excel the Slovenian characters are written correct.

        I tried what you suggested:

        $SheetOut->write(0,0,"&#268;&#269;ŠšŽž"); $SheetOut->write(1,0,encode("cp1250",$Test[0]->{Col1}));

        In Xls file the first line is written correct, the second one not.

        Why are characters defined as string (line 1) correct and those read from array (line 2) not correct?

        Are there two or more different possible # characters in#!/usr/bin/perl and by Murphy I use the wrong one or something? Just kidding. :/