ron7 has asked for the wisdom of the Perl Monks concerning the following question:

I have a large application that runs as a command line utility, or as GUI. In both cases, the UI just gathers args for a common module that does all the work. The app must run under all flavours of Windows, Linux, and OS-X. It is distributed as executable binaries for all three platforms using ActiveState app builder (with source, see http://itee.uq.edu.au/~chernich/tagsuite.html).

The problem is that when run from the command line, it will correctly process files with extended chars (non English alphabet if you will) in the name/path. When run as a Tk GUI, it won't! (see this post for my initial worries).

In other words, SOMETHING in Tk changes the runtime environment such that the encoding of filepaths, either entered manually, found by readdir, or selected in Tk::getOpenFile are not recognized as valid paths by any common Perl file operation (-f, -d, open, etc)!!

I can pull some tricks to make the filepaths digestable to Linux and OS-X, and some even more diabolical ones to partially address the Windows situation, but this will NEVER be a fix because the problem is so low-level that using Tk::getOpenFile to select files fails internally when trying to open a directory with extended chars in the name. The list box has shown the directory, but when you double-click to enter it, an error dialog is generated by getOpenFile saying the file does not exist.

I know how to fix this, but I'm not about to start hacking Tk as a past-time. It appears to me that there is a fundamental problem with Tk and extended chars in filenames. I also can't believe I'm the first to encounter this, but I can't find any posts on any forums that mention this problem specifically.

If any Monk can point me at something that helps, or suggest a different forum where Perl Tk is might get better coverage, I'd sure appreciate it.

I've not included a demo program due to the difficulty of including/creating the necessary test data, but if anyone is serious about assisting, I'm happy to create a little tar with example code and data.

  • Comment on Tk causes problems with file paths containing non-Latin-1 chars

Replies are listed 'Best First'.
Re: Tk causes problems with file paths containing non-Latin-1 chars
by lamprecht (Friar) on May 04, 2011 at 09:48 UTC

    This should work on Win32:

    use warnings; use strict; use Tk; if ($^O =~ /Win32/){ my $gof_wrapped = \&Tk::getOpenFile; *Tk::getOpenFile = sub{my $file = shift->$gof_wrapped; return Win32::GetANSIPathName($file)}; } my $mw = tkinit; my $path = $mw->getOpenFile(); open (my $fh, '<', $path) or die "can not open file $path : $!"; while (my $line = <$fh>) { print $line; } close $fh; MainLoop;
      Thanks! That fixes the problem in my test code. Refactoring the app will be a bit more complex, but it looks do-able.

      The changes to the test program so it will run under Linux|OS-X|Win32 are:

      ... use if ($^O =~ m/MSWin32/i || ''), 'Win32'; ... my $is_win32 = $^O =~ m/MSWIN32/i; ... sub test_file { my $file = ucx(shift); ... } sub ucx { my $fn = shift; return $is_win32 ? Win32::GetANSIPathName($fn) : decode('utf8', $fn) +; }

      Out of curiosity, I went looking for the GetANSIPathName method in the ActiveState (win32) lib code. As far as I could see, it's not a Perl function, so I guess it must be all in a dll someplace.

      For anyone who wants to play with this, a zip of the code and data can be downloaded from http://itee.uq.edu.au/~chernich/downloads/tkbug.zip.

Re: Tk causes problems with file paths containing non-Latin-1 chars
by Anonymous Monk on May 03, 2011 at 10:26 UTC
      As a test, I rewrote my gui test program (see later post) to use Tkx. Surprisingly easy, though I can't find a scrolled text gadget which is a worry. The changed subroutines only below:
      ... use Tkx; ... sub init_gui { $win = Tkx::widget->new('.'); $win->g_wm_geometry('+500+200'); $win->g_wm_title('Tkx wrapper'); $win->g_wm_minsize(400, 200); $win->g_wm_protocol('WM_DELETE_WINDOW', sub { exit; }); $win->g_wm_resizable(0, 0); my $f1 = $win->new_frame(); my $f2 = $win->new_frame(); $t1 = $f1->new_text(); $t1->g_pack(-anchor=>'e', -fill=>'both'); my $b1 = $f2->new_button(-text=>'Browse...', -width=>10, -command=>\ +&select); $b1->g_pack(-anchor=>'e', -fill=>'both'); $f1->g_grid(-row=>0, -column=>0, -padx=>8, -pady=>4, -sticky=>'n'); $f2->g_grid(-row=>1, -column=>0, -padx=>8, -pady=>4, -sticky=>'n'); return; } ... sub select { my $filename = Tkx::tk___getOpenFile( -defaultextension => '', -initialdir => $path, -title => 'Select file', ); if ($filename) { test_file($filename); } return; } __END__

      Result? No improvement. The sub "load" still finds all the files which test and open ok in sub "test_file". Files with extended chars in the path opened under Windows using the sub "select" linked to the button give a "Does not exist" on the -d test of sub "test_file". They open fine under Linux (and presumably OS-X too if I wasn't too lazy to pull the Mac out).

      Probably shouldn't be a surprise as I suspect Tk::FBox->getOpenFile and Tkx::tk___getOpenFile are thin wrappers over Tcl which calls the native file dialog which is where I suspect the real encoding problem lies.

      So, Tkx is NO CURE for the dreaded extended chars in the file path problem (and I've decided not to like Tkx ;-). Converting would mean more complexity to manage scroll bars manually and probably other things. Only positive is I'd be freed from AS perl 5.8.x which Tk support ties me to.

      Update: I've found CPAN Tkx::Scrolled which seems to work fine installed with ActiveState cpan (no ppm). Some differences between Tk and Tkx re scrolling programatically, but I'm liking Tkx a bit better.

      Yes, I know about ActiveState's recommendation to use Tkx in place of Tk (which pins me to 5.8.x forever). AS is still maintaining Tk, recently issuing an update to fix a startup problem on OS-X which required that libraries be installed, even for "compiled" executables.

      I sorta ignored their recommendation because I didn't feel like learning another GUI, plus the fact that my app is 12,504 LOC, spread across 19 modules (only 5 of which use Tk, so maybe it won't be that awful). If you have an opinion of the Tkx learning curve, documentation, and Tk->Tkx migration, I'm all ears...

Re: Tk causes problems with file paths containing non-Latin-1 chars
by zentara (Cardinal) on May 03, 2011 at 13:18 UTC
    I ran into this problem awhile back with Tk, and this is the solution I was handed. You need to manually decode all the filenames to ensure the utf8 flag gets set in the filesystem. Don't ask me to explain all the details, of how utf8 flags get set or ignored. :-)
    #this decode utf8 routine is used so filenames with extended # ascii characters (unicode) in filenames, will work properly use Encode; opendir my $dh, $path or warn "Error: $!"; my @files = grep !/^\.\.?$/, readdir $dh; closedir $dh; @files = map { decode( 'utf8', "$path/".$_ ) } sort @files;

    Tk still works, but the active developer died a few years ago. If you want or need the kind of robust filesystem reading which you described, switch to Gtk2. It is very active, and bug fixes come in a matter of weeks.


    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
      Zentara: Your Encode::decode utf8 solution (which I found during experiments) is a fix for Linux and OS-X, but does not work for win32 (testing under XP only, no idea if Vista/W7 are different, but I can't see why they would be). For win32, the filename treatment required is
      @files = map { pack 'UW*', unpack 'C*', $_; } @files;
      The resultant string will now open and respond as expected to -d, -f, etc, but is no longer printable, or regex processable due to the "long char" problem. So I have to keep two copies, of the name: one for processing, one for display.

      All that can be coded up via a simple package that decides:

      • Are we running inside Tk?
      • If so, is the os Windows?

      I actually started doing that and the code got so horrible, I gave up took the coward's way out: added a known bug/limitation for Windows users (actually it was the inherent bug with Tk::FBox->getOpenFile barfing that made me give up)

      [Updated to show who and what I'm replying to as the post is out of place for some reason].

      it appeared that perl/Gtk dropped ATUOLOAD mechanics during GTk binding, and this means that huge amount of perl subroutines is created at startup, which are mostly never used, just pollute symbol table.

      This is a shame, I can not believe people program this way nowadays.
      Perl is fast, convenient, and doing huge amount of needless work at startup is just not the way to go.

      Developers said they feel autoloading is not very stable, and - what they decided? let us, users, pay for their inability to efficiently program?

      This is just not acceptable.

      .... I want to add that probably it is not very correct to emphasize on what have happened on author of perl/Tk.
      Moreover, perl/Tk is supported now and gets new releases.
      Maintaining perl/Tk is difficult given the implementation specific of the module (indeed, I would recommend to switching to any 3 other Tk modules)

      but your current sentence could be easily misread
      - current active maintainer is well and hopefully healthy.

Re: Tk causes problems with file paths containing non-Latin-1 chars
by Anonymous Monk on May 03, 2011 at 22:23 UTC
    On windows this should work
    use strict; use warnings; use Tk; use Win32::Unicode::File; my $mw = tkinit; my $path = $mw->getOpenFile(); my $fh = Win32::Unicode::File->new; if ($fh->open('<', $path) ){ while (my $line = $fh->readline()){ print $line; } }
      You are right, Win32::Unicode allows the problem to be fixed, though the code no longer runs under Linux/OS-X.

      Grep says my codebase has 67 instances of -[fdew] file test operators that would need translating to "file_type" equivalents (not a 1:1 match unfortunately), and 28 of "open|close" that would need changing, not to mention instances where I pass the name of an image file to functions to create/resize displayable images. These may accept file handles instead, but viewed any which way, it's LOT of work I'd rather avoid ;-)

      I must also admit I can't immediately see any way of doing this in a platform independent way that maintains a single codebase.

      The code is my Tk based test case for the problem. The commented out lines in "sub test_file" run fine with Linux/OS-X but fail under win32. Each commented out line is followed by a Win32::Unicode equivalent that "fixes" the problem for win32:

      #!/bin/perl # # Tk demo replicates command line demo to read and checks files with e +xtended chars in the path # use strict; use warnings; use utf8; use Tk; use Cwd; use File::List; use File::Spec; use File::Basename; # These modules replicate those used by the TagSuite environment # but are not actuall used here. use Carp (qw/ croak /); use Encode; use POSIX; use XML::LibXML; use File::Spec; use Debug::Simple; use Tk qw/ Ev /; use Tk::PNG; use Tk::JPEG; use Tk::widgets qw/ NoteBook BrowseEntry Dialog DialogBox FBox Pane/; use Tk::ProgressBar; use Tk::Splashscreen; use GD::Image; use Image::Resize; use Image::ExifTool qw/ :Public /; use MIME::Base64; use DBM::Deep; use LWP::Simple; use HTML::TokeParser; use File::Copy; use File::Path; use Math::BigInt; my $path; my $win; my $t1; MAIN: { init_gui(); $path = File::Spec->catfile(getcwd(), 'data'); load(); MainLoop; } sub init_gui { Tk::CmdLine::SetArguments(qw(-geometry +410+300)); $win = MainWindow->new(-title=>'Tk wrapper', ); $win->protocol('WM_DELETE_WINDOW', sub { exit; }); $win->resizable(0, 0); my $f1 = $win->Frame(); my $f2 = $win->Frame(); $t1 = $f1->Scrolled('Text', -scrollbars=>'oe', -wrap=>'word', -heigh +t=>'26', -width=>100, ); $t1->pack(-anchor=>'e', -fill=>'both'); my $b1 = $f2->Button(-text=>'Browse...', -width=>10, -command=>\&sel +ect); $b1->pack(-anchor=>'e', -fill=>'both'); $f1->grid(-row=>0, -column=>0, -padx=>8, -pady=>4, -sticky=>'n', ); $f2->grid(-row=>1, -column=>0, -padx=>8, -pady=>4, -sticky=>'n', ); return; } sub load { my $lister = File::List->new($path); my @files = @{ $lister->find('') }; foreach my $file (@files) { test_file($file); } return; } sub test_file { my $file = shift; my $txt = ''; use Win32::Unicode::File; my $dh = Win32::Unicode::File->new; my $dir = dirname($file); #if (! -d $dir) { if (! file_type d => $dir) { $t1->insert('end', "ERROR: Base Dir '$dir' does not exist!\n"); return; } #$txt = (-w $dir) ? "Dir '$dir' is Writable by you\n" : "ERROR: dir +'$dir' NOT writable by you\n"; $txt = (! file_type r=>$dir) ? "Dir '$dir' is Writable by you\n" : " +ERROR: dir '$dir' NOT writable by you\n"; $t1->insert('end', $txt); #if (! -f $file) { if (file_type e=>$file) { $t1->insert('end', "ERROR: File '$file' does not exist!\n"); return; } #if (open my $in, '<', $file) { if ($dh->open('<', $file)) { #close $in; $dh->close; $txt = "File '$file' opened Ok\n"; } else { $txt = "ERROR: Unable to open '$file'\n"; } $t1->insert('end', $txt); return; } sub select { my $filename = $win->getOpenFile( -defaultextension => '', -initialdir => $path, -title => 'Select file', ); if ($filename) { test_file($filename); } return; }
      The data tree looks like this (note the accented 'ô' of Zatochi):
      data | +- Pan's Labyrinth (2006) | | | +- mymovies.xml | | | +- BDMV | | | +- index.TAG | +- Zatôichi 01 - The Tale of Zatoichi (1962) | +- mymovies.xml | +- VIDEO_TS | +- VIDEO_TS.TAG
Re: Tk causes problems with file paths containing non-Latin-1 chars
by vkon (Curate) on May 03, 2011 at 17:59 UTC
    I have a module that makes exactly same approach at using GUI!
    It was developed at my work and I am using in a whole bunch in my utilities - I just specify predefined array, which declares parameters, and - all is ready!

    If then my script is fed by --g, it shows GUI for entering any of arguments, with handy explanations on the left, and this is really convenient, because I have supported several types of arguments: boolean becomes checkbox on GUI, choise is either dropdown box or just list to select a vaule, also I have a file/directory selection (either input or output) and just string for any other parameter

    Also, it supports drag-n-drop of files and directories too, so you could drop a file from file manager into the input box.

    I can show more details, if there are any interest.

    Tk module uses is Tcl::Tk, which is really better than perl/Tk, due to many reasons - I switched long ago from per/Tk and never looked back.

      Thanks for the suggestion, but I'm constrained to whatever ActiveState supports as my users are mostly non-technical windows weenies who need exe files they can double-click to launch an app. So I use AS's perl app builder and generate binaries for Linux, win32, and OS-X which works fine except for the extended char in path problem, now fixed for all but win32, which is 95% of my users!! grrr...
        Tcl::Tk is pure-perl, 4 pm files, no dependencies other than Tcl CPAN module, so it should be no problem in packaging it,
        so using even easier than Tkx, because you will have perl/Tk syntax for your programs.