zentara has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Tk appearing in Chinese font spurred a memory of a question that went unanswered on comp.lang.perl.tk. Someone posted the script below, and wanted to know, why the Chinese was displayed properly if read in as variables, but it wouldn't work reading from DATA. If any of you unicode experts know the answer, please enlighten us. The are 2 subs below for testing, comment out "method1" and try "method2" to see the difference.
#!/usr/bin/perl use warnings; use strict; use Tk; my %codes_hash; my $mw = new MainWindow; #works fine when hard coded, but not from DATA #so method1 is good, method2 is'nt working &define_codes_method1; #&define_codes_method2; foreach my $language ( 'English', 'Chinese', 'Japanese', 'Malay' ) { $mw->Label( -text => $language, -font => 'arial 12 bold' )->pack; foreach my $code ( sort keys %codes_hash ) { $mw->Label( -text => "$codes_hash{$code}{$language} ($code)", -font => 'arial 12', )->pack; } $mw->Label( -text => '-' x 30, -font => 'arial 12' )->pack; } $mw->geometry('+100+100'); MainLoop; sub define_codes_method1 { %codes_hash = ( 100 => { English => "Contactor Repair", Chinese => "\x{4FEE}\x{593E}\x{8173}", Japanese => "PB\x{4FEE}\x{7406}", Malay => "Membaiki Contactor", }, 120 => { English => "Change Cleaning Disk", Chinese => "\x{66F4}\x{63DB}\x{786C}\x{789F}", Japanese => "\x{30AF}\x{30EA}\x{30FC}\x{30CB}\x{30F3}\x{30B0}\x{4EA4}\x{63DB}", Malay => "Menukar Cakera Pencuci", }, 130 => { English => "Server/Network Problem", Chinese => "\x{7DB2}\x{8DEF}\x{554F}\x{984C}", Japanese => "\x{30CD}\x{30C3}\x{30C8}\x{30EF}\x{30FC}\x{30AF}\x{306E}\x{30C0}\x{30 +A6}\x{30F3}", Malay => "Masaalah Server/Rangkaian", }, 140 => { English => "Waiting on Support", Chinese => "\x{5F85}\x{4FEE}\x{6A5F}\x{4EBA}\x{54E1}", Japanese => "\x{30B5}\x{30DD}\x{30FC}\x{30C8}\x{5F85}\x{30 +61}", Malay => "Menunggu Sokongan Teknikal", }, ); } sub define_codes_method2 { my $language; while (<DATA>) { chomp; next if !/\S/; if (/^%language=(.*)/) { $language = $1; } else { my ( $code, $descr ) = split /,/; $codes_hash{$code}{$language} = $descr; } } } __DATA__ %language=English 100,"Contactor Repair" 120,"Change Cleaning Disk" 130,"Server/Network Problem" 140,"Waiting on Support" %language=Chinese 100,"\x{4FEE}\x{593E}\x{8173}" 120,"\x{66F4}\x{63DB}\x{786C}\x{789F}" 130,"\x{7DB2}\x{8DEF}\x{554F}\x{984C}" 140,"\x{5F85}\x{4FEE}\x{6A5F}\x{4EBA}\x{54E1}" %language=Japanese 100,"PB\x{4FEE}\x{7406}" 120,"\x{30AF}\x{30EA}\x{30FC}\x{30CB}\x{30F3}\x{30B0}\x{4EA4}\x{63DB}" 130,"\x{30CD}\x{30C3}\x{30C8}\x{30EF}\x{30FC}\x{30AF}\x{306E}\x{30C0}\ +x{30A6}\x{30F3}" 140,"\x{30B5}\x{30DD}\x{30FC}\x{30C8}\x{5F85}\x{3061}" %language=Malay 100,"Membaiki Contactor" 120,"Menukar Cakera Pencuci" 130,"Masaalah Server/Rangkaian" 140,"Menunggu Sokongan Teknikal"

I'm not really a human, but I play one on earth. flash japh

Replies are listed 'Best First'.
Re: Tk-reading-chinese-out-of-data
by Tanktalus (Canon) on Jul 18, 2005 at 17:44 UTC

    As a first guess, I'd imagine you'd need to eval $descr to get it to remove the surrounding quotes and interpolate the \x's.

Re: Tk-reading-chinese-out-of-data
by graff (Chancellor) on Jul 18, 2005 at 20:45 UTC
    The "eval" suggestion in the first reply didn't work for me, but the following did:
    sub define_codes_method2 { my $language; while (<DATA>) { s/\s+$//; # ("chomp" is too OS-dependent) next if !/\S/; if (/^%language=(.*)/) { $language = $1; } else { my ( $code, $descr ) = split /,/; $descr =~ s/\\x\{(\w{4})\}/chr(hex($1))/eg; # convert hex +values to chars $codes_hash{$code}{$language} = $descr; } } }
    Update: actually, the eval approach does work -- for some reason, when DATA has CRLF line-breaks and my unix "chomp" leaves the "\r" behind, nothing works. After fixing it so that the "s/.../chr(hex())/eg" approach would work, I tried the eval approach again, and that worked too:
    my ( $code, $descr ) = split /,/; $codes_hash{$code}{$language} = eval $descr;