Marais has asked for the wisdom of the Perl Monks concerning the following question:

I'm using Perl/Tk Text widgets to display, edit and update data in a MySQL database. Generally it works OK, with one exception: when I read a string with UTF-8 characters out of the database, it isn't displayed properly in the Text widget. I can paste UTF-8 characters into the text widget, and they do end up in the data base correctly.

So, here's how I try to display stuff from the database:

package photos_unused; require Exporter; use DBI; use photos_globals; use Tk; use Tk::NoteBook; use Tk::Pane; use Tk::JPEG; use Tk::DragDrop; use Tk::DropSite; use Encode qw(encode decode); use strict; . . . # get the photo properties from the data base eval { ($credit, $caption, $extra_text, $out, $original_file_name, $direc +tory) = $dbh->selectrow_array( "SELECT credit, caption, extra_text, outx, original_file_name, + directory " . "FROM photos " . "WHERE photo_num = $photo_num " . "LIMIT 1"); }; #end of eval block; if ($@) { error_message("Database error: $@\n"); } else { $credit = decode('UTF-8', $credit); $caption = decode('UTF-8', $caption); $extra_text = decode('UTF-8', $extra_text); $out = decode('UTF-8', $out); $tl = $mw->Toplevel(); $tl->geometry('500x400'); $tl->title("Photo " . $photo_num . " Properties"); $tl->Label(-text=>"Original file name: " . $original_file_name . " + Directory: " . $directory)->pack(); $fr1 = $tl->Frame(-pady=>5, -padx=>3); $fr1->Label(-text=>"Credit")->pack(-anchor=>'w'); $txtCredit = $fr1->Text(-height=>5)->pack; $txtCredit->insert('end', $credit); $fr1->pack; $fr2 = $tl->Frame(-pady=>5, -padx=>3); $fr2->Label(-text=>"Caption")->pack(-anchor=>'w'); $txtCaption = $fr2->Text(-height=>5)->pack; $txtCaption->insert('end', $caption); $fr2->pack; $fr3 = $tl->Frame(-pady=>5, -padx=>3); $fr3->Label(-text=>"Extra text")->pack(-anchor=>'w'); $txtExtraText = $fr3->Text(-height=>5)->pack; $txtExtraText->insert('end', $extra_text); $fr3->pack; $fr4 = $tl->Frame(-pady=>5, -padx=>3); $btnCancel = $fr4->Button(-text => "Cancel", -command=> sub{$tl- +>destroy})->pack(-side=>'left', -anchor=>'e'); $btnOK = $fr4->Button(-text => "UPDATE", -command=> [\&update_ph +oto_properties, $tl, $photo_num])->pack(-side=>'right', -anchor=>'w') +; $fr4->pack(-fill=>'x'); }

I'm using Perl 5.8.8 and Perl/Tk 804.028 on Ubuntu.

Apologies for the large chunk of code, but I wanted to make sure that there was enough to show how I was going about this.

Thanks!
--- Marais

Replies are listed 'Best First'.
Re: Perl/Tk: utf8 in Text widget?
by Marais (Novice) on Sep 24, 2009 at 21:52 UTC
    Here's a tiny complete app which demonstrates the problem:
    #!/usr/bin/perl -w use Tk; use strict; my ($mw, $test_string, $txt); $mw = MainWindow->new(-title => "UTF-8 Test", -width=>400, -height=>30 +0); $test_string = 'This is a sample Chinese character: 我'; $txt = $mw->Text(-height=>5)->pack; $txt->insert('end', $test_string); MainLoop;

    The entity in the code above was inserted by the PerlMonks web site. In my original source code I have a Chinese character. If anyone wants to try out this issue, you'll have to somehow get a Unicode character into the string.
    --- Marais

    Update: Oh boy, life is complicated. I now find that this code, designed to demonstrate the problem, actually works IF you use this:
    $test_string = 'This is a sample Chinese character: ' . chr(25105);
    and it also works with the original Chinese character if I include
    use utf8;

    So clearly I have to give this some more thought. It's looking as if a string retrieved from the database must be different in some important way from a constant string. Gack.

      Here's another way for that test snippet to work (knowing that 25105. == 0x6211):
      #!/usr/bin/perl -w use Tk; use strict; my ($mw, $test_string, $txt); $mw = MainWindow->new(-title => "UTF-8 Test", -width=>400, -height=>30 +0); $test_string = "This is a sample Chinese character: \x{6211}"; $txt = $mw->Text(-height=>5)->pack; $txt->insert('end', $test_string); MainLoop;
      Works for me, no problem -- I see a Chinese character. In fact, I distinctly remember that whenever I've installed Tk for perl 5.8, the "make test" phase spends a fair bit of time going through all the unicode characters (including the Chinese, Japanese and Korean).

      I don't think Tk can do Arabic (or any other right-to-left writing system), and I don't know how well it can do with Devanagari or similar Indic scripts, but all the left-to-right characters work fine.

      So if this little test snippet isn't working for you, there might be a problem with your particular installation of the Tk bundle (or maybe it's a font issue?)

      If this test snippet works, but you're still having trouble getting stuff from mysql to display correctly (even though you seem to be decoding it as you should, or have "utf8_enabled" in the DBI connection), then you may want to check other modes of access to the DB, to see if the actual content in your tables is not what you expect.

Re: Perl/Tk: utf8 in Text widget?
by lamprecht (Friar) on Sep 24, 2009 at 20:14 UTC
    Hi,

    did you try  mysql_enable_utf8?


    Cheers, Christoph
      Thanks for the suggestion. I gave it a try, and got the following error message:
      Tk::Error: Cannot decode string with wide characters at /usr/lib/perl/5.8/Encode.pm line 166.

      A UTF-8 character pasted into one of the Text widgets does end up in the database correctly, so my feeling is that communications between the app and the database are not the problem.

Re: Perl/Tk: utf8 in Text widget?
by Marais (Novice) on Sep 25, 2009 at 18:51 UTC
    OK, I'm in UTF-8 hell here. I thought that I would at least verify that I could read and write UTF-8 strings to my database, but while working on that, I discovered that I can't even get a trivial Perl program to run understandably.

    Here's my test program:

    #!/usr/bin/perl -w use strict; use utf8; my ( $codepoint, $test_string, ); binmode STDOUT, ":utf8"; $codepoint = ord('我'); print "Codepoint of character is $codepoint\n"; $test_string = "Here's a test string with 我\n"; print "test_string is $test_string\n"; $test_string_dec = decode('utf8', $test_string);
    (In my original, I had the literal Chinese character 我 where you see the big numeric constant.)

    There are at least two issues here:

    1. The presence of the "use utf8" pragma: I gather from Googling that this is no longer required in current versions of Perl (I'm using 5.8.8.) But, if I leave it out, the codepoint reported by "ord" is 250, instead of 25105. Surely Perl should know that the Chinese character is Unicode?
    2. The "binmode" statement: This interacts with the "use utf8" pragma in the following ways:
      both present: correct codepoint, correct output character, no error message
      pragma present, binmode omitted: correct codepoint, correct output character, "Wide character in print" error message.
      pragma omitted, binmode present: wrong codepoint, wrong output character, no error message
      Both omitted: wrong codepoint, output shows correct Chinese char, no error message.

    So, I'm really confused. What does the "use utf8" pragma actually do in Perl 5.8.8? Why do I get the correct character showing on output even when I get the "Wide character in output..." message?

    --- Marais