HelenCr has asked for the wisdom of the Perl Monks concerning the following question:

Hi wizards, I seek your wisdom. I am running Active Perl 5.14 on Windows 7.
I am trying to write a program that will read-in a conversion table, then work on a file and replace certain patterns by other patterns - all of the above in Unicode (UTF-8). Here is the beginning of the program:
#!/usr/local/bin/perl # Load a conversion table from CONVTABLE to %ConvTable. # Then find matches in a file and convert them. use strict; use warnings; use Encode; use 5.014; use utf8; use autodie; use warnings qw< FATAL utf8 >; use open qw< :std :utf8 >; use charnames qw< :full >; use feature qw< unicode_strings >; my ($i,$j,$InputFile, $OutputFile,$word,$from,$to,$linetoprint); my (@line, @lineout); my %ConvTable; # Conversion hash print 'Conversion table: opening file: E:\My Documents\Perl\Conversio +n table.txt'."\n"; my $sta= open (CONVTABLE, "<:encoding(utf8)", 'E:\My Documents\Perl\C +onversion table.txt'); binmode STDOUT, ':utf8'; # output should be in UTF-8 # Load conversion hash while (<CONVTABLE>) { chomp; print "$_\n"; # etc ... # etc ...
It turns out that at this point, it says:
wide character in print at (eval 155)E:/Active Perl/lib/Perl5DB.pl:640 +]line 2, <CONVTABLE> line 1, etc...
Why is that? I think I've gone through all the necessary prescriptions for correct handling of Unicode, decoding and encoding into UTF-8?
And how to fix it?
TIA
Helen
Note: I may cross-post on StackOverflow

Replies are listed 'Best First'.
Re: print UTF-8 problem
by ikegami (Patriarch) on Feb 15, 2012 at 20:08 UTC
      I will return in a couple of hours
Re: print UTF-8 problem
by Anonymous Monk on Feb 15, 2012 at 19:25 UTC

    It turns out that at this point, it says:

    perl5db.pl is not your program :) what does your program say when you run it without debugger

      Running the program without the debugger,it says:
      Name "main::INPUT" used only once: possible typo at Conv.pl line 58. Name "main::CONVTABLE" used only once: possible typo at Conv.pl line 2 +6. Name "main::OUTPUT" used only once: possible typo at Conv.pl line 71. Conversion table: opening file: E:\My Documents\Perl\Conversion table. +txt &#8745;&#9559;&#9488;England, Germany he, she the, HOMHOM <&#9579;ö&#9579;ó&#9579;¿&#9579;&#9474;11> <&#9579;ö&#9579;ó&#9579;£&# +9579;Ö&#9579;ò&#9579;ƒ> <&#9579;ù &#9579;Ö&#9579;&#9474; &#9579;¥> <&#9579;ù&#9579;Ö&#9579;Ö&# +9579;¥> <&#9579;¿&#9579;&#9474;&#9579;æ&#9579;&#9474;&#9579;¬> <&#9579;¿&#9579 +;Ö&#9579;æ&#9579;Ö&#9579;¬> <&#915;Çó&#9579;ÿ&#9579;¿&#9579;É&#9579;¢&#9579;¿&#9579;ÿ><&#9579;Ö&#9 +579;&#8976;&#9579;¿&#9579;É&#9579;¢&#9579;¿&#9579;ÿ>
      It prints gibberish to the Windows console (aka "DOS box"), instead of the right UTF-8 characters.
        There aren't 71 lines in what you posted. Please run the code you actually posted.

        It prints gibberish to the Windows console (aka "DOS box"), instead of the right UTF-8 characters.

        What makes you think your console understands UTF-8? Type chcp at the prompt, prepend "cp" to the number, and use that as the encoding.