Hi everyone, I have two related questions on the handling of UTF-8 characters.

1: How do you write a script that can take user input via a variable and write it to a UTF-8 text file? Here's what I have (the commented lines are my subsequent attempts to correct the issue, which failed).

#!/usr/bin/perl #use utf8; print "Text? "; chomp ($note = <STDIN>); print "\nText: ${note}"; #open(TEST, ">>:encoding(UTF-8)", "test.txt") or die "Can't open UTF-8 + encoded file: $!"; open(TEST, ">>", "test.txt") or die "Can't open file: $!"; print TEST "\nDirectly from the script: αινσφ&#337;ϊό&#369;\n"; print TEST "\nUser input via variable: $note\n"; close TEST; <STDIN>;

Note: I'm getting HTML character codes instead of the ő and ű letters in this post in the code... Character encoding strikes again.

As you can see, it takes user input and writes it to a file, along with some accented characters that are hardcoded into the script. The script itself is saved in UTF-8 to allow the use of all accented characters. It works fine on Ubuntu but fails on XP for me. On XP, the characters are printed correctly in the command line window by the print "\nText: ${note}"; line but they are corrupted in the file. The hardcoded stuff is fine, but if I type in the same accented letters when the script runs, they are mis-encoded.

By the way, the larger script this is a part of also reads accented characters from a UTF-8 file and writes them to another file, and that works fine on both Ubuntu and XP. So, essentially, I only have trouble with non-ascii characters if they are stored in a variable and written to a file from there. Any ideas?

2: I'm trying to get Spreadsheet::WriteExcel to work on UTF-8 files, and it's not looking very good. Here's my code for writing all lines of a file into Column A of a new spreadsheet:

#!/usr/bin/perl use warnings; use Spreadsheet::WriteExcel; # Create a new Excel workbook my $workbook = Spreadsheet::WriteExcel->new('perl.xls'); # Add a worksheet $worksheet = $workbook->add_worksheet; # write file to column A open (IN, "column1.txt"); $count = 0; while (<IN>) { $count ++; chomp ($_); $worksheet->write("A$count", $_); } close IN; <STDIN>;
I've been trying to read up on whether and how Spreadsheet::WriteExcel can handle UTF-8 characters, but I found no clear info. (Spreadsheet::WriteExcel: http://search.cpan.org/~jmcnamara/Spreadsheet-WriteExcel/lib/Spreadsheet/WriteExcel.pm ; its info on Unicode: http://search.cpan.org/~jmcnamara/Spreadsheet-WriteExcel/lib/Spreadsheet/WriteExcel.pm#UNICODE_IN_EXCEL - this seems to say what I'm trying should work - I have Perl 5.10)

This code does what I want it to on both Ubuntu and XP (the xls is created with the right content) but accented characters are corrupted in both OSes.

Thanks for any help!


In reply to UTF-8 issues with Perl in general and with Spreadsheet::WriteExcel by elef

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.