HelenCr has asked for the wisdom of the Perl Monks concerning the following question:
Dear highly esteemed PerlMonks
Update: how do I make the PerlMonks web site show the foreign fonts, instead of the HEX?
I am working on a project which deals with data in foreign languages. My Perl scripts were running fine.
I then wanted to use Tie::File, since this is a neat concept (and saves time and coding).
It seems that Tie:File is failing under Unicode/UTF-8 (unless I am missing something).
Here is a program which depicts the problem: (The data is a mix of English, Greek and Hebrew).
use strict; use warnings; use 5.014; use Win32::Console; use autodie; use warnings qw< FATAL utf8 >; use Carp; use Carp::Always; use utf8; use feature qw< unicode_strings>; use charnames qw< :full>; use Tie::File; my ($i); my ( $FileName); my (@Tied); binmode STDOUT, ':unix:utf8'; binmode STDERR, ':unix:utf8'; binmode $DB::OUT, ':unix:utf8' if $DB::OUT; # for the debugger Win32::Console::OutputCP(65001); # Set the console code page t +o UTF8 $FileName = 'E:\\My Documents\\Technical\\Perl\\Eclipse workspace\\FIB +I OCR\\Work\\'. 'Tie File test res.txt'; tie @Tied, 'Tie::File', $FileName, recsep => "\x0D\x0A", discipline => + ':encoding(utf8)' or confess 'tie @Tied failed'; $i =0; while (<DATA>) { chomp; $Tied[$i] = $_; ++$i; } # end while (<DATA>) $i =0; foreach (@Tied) { say "$i $Tied[$i]"; ++$i; } # end foreach (@Tied) untie $FileName; __DATA__ τι κάνετε; πάρτε το ή αφή& +#963;τε το שלום חברים abc לא כןכן efg מתי ולאן This is it מעכשיו לעכש +;יו Σήμερα είναι &# +932;ρίτη Θέλω να φάω τι κάνετε; שורה מס' 5
This produces a huge cascade of warnings: here is some:
utf8 "\xCE" does not map to Unicode at F:/Win7programs/Dwimperl/perl/l +ib/Tie/File.pm line 917 Tie::File::_read_record('Tie::File=HASH(0x24cb72c)') called at + F:/Win7programs/Dwimper l/perl/lib/Tie/File.pm line 175 Tie::File::_fetch('Tie::File=HASH(0x24cb72c)', 0) called at F: +/Win7programs/Dwimperl/p erl/lib/Tie/File.pm line 210 Tie::File::STORE('Tie::File=HASH(0x24cb72c)', 0, 'τι + κάνετε;') called at tie file test .pl line 31 utf8 "\xCF" does not map to Unicode at F:/Win7programs/Dwimperl/perl/l +ib/Tie/File.pm line 917 Tie::File::_read_record('Tie::File=HASH(0x24cb72c)') called at + F:/Win7programs/Dwimper l/perl/lib/Tie/File.pm line 175 Tie::File::_fetch('Tie::File=HASH(0x24cb72c)', 0) called at F: +/Win7programs/Dwimperl/p erl/lib/Tie/File.pm line 210 Tie::File::STORE('Tie::File=HASH(0x24cb72c)', 0, 'τι + κάνετε;') called at tie file test .pl line 31 utf8 "\xD7" does not map to Unicode at F:/Win7programs/Dwimperl/perl/l +ib/Tie/File.pm line 917 Tie::File::_read_record('Tie::File=HASH(0x24cb72c)') called at + F:/Win7programs/Dwimper l/perl/lib/Tie/File.pm line 175 Tie::File::_fetch('Tie::File=HASH(0x24cb72c)', 0) called at F: +/Win7programs/Dwimperl/p erl/lib/Tie/File.pm line 210 Tie::File::STORE('Tie::File=HASH(0x24cb72c)', 0, 'τι + κάνετε;') called at tie file test .pl line 31 utf8 "\xD7" does not map to Unicode at F:/Win7programs/Dwimperl/perl/l +ib/Tie/File.pm line 917 Tie::File::_read_record('Tie::File=HASH(0x24cb72c)') called at + F:/Win7programs/Dwimper l/perl/lib/Tie/File.pm line 175 Tie::File::_fetch('Tie::File=HASH(0x24cb72c)', 0) called at F: +/Win7programs/Dwimperl/p erl/lib/Tie/File.pm line 210 Tie::File::STORE('Tie::File=HASH(0x24cb72c)', 0, 'τι + κάνετε;') called at tie file test .pl line 31
Then it prints this on STDOUT:
0 τι κάνετε; 1 πάρτε το ή αφή +;στε το 2 שלום חברים 3 abc לא כןכן efg 4 מתי ולאן This is it 5 מעכשיו לעכ +13;יו 6 Σήμερα είναι +Τρίτη 7 Θέλω να φάω 8 τι κάνετε; 9 שורה מס' 5 10 11 12 13 14 \xA4\xΘέλω\xA8\x 15 16 17 18 19
Note that the first 9 lines are OK, but lines 10 through 19 came from nowhere!?
In addition, the output file contains corrupted data:
τι κάνϏN͏Ŏՠτή +;στε של חברءbc  +500;ؗܗࠗܗߠeמתול& +#1488;ן This is מעיו לע +99;؎Ďώݎ֏ναι Τρ&# +920;έώގѠφϏŎ٠κτ&# +949;;שרה מס' \xA4\xΘέλω\xA8\x
Something is very wrong here. Either I am missing something, or Tie:File can't cope with Unicode/UTF-8?
I am runnning Strawberry Perl 5.14 on a Windows 7 system.
Many TIA - Helen
Note: cross- posted on http://stackoverflow.com/questions/13209474/
|
|---|