ron7 has asked for the wisdom of the Perl Monks concerning the following question:

I have a module which is called by a command-line shell, or by a Tk GUI shell. In both cases the shells just collect parameters and the module does the work which includes building a list of files to process.

My problem is files or directories with non Latin-1 chars in them. When run in the command line context, such files are "found". When run within the Tk GUI context, they are not!

I inserted a debug print into the File::List->new processing loop to display the directories being processed, with an else to print those which were neither -d, nor -f. The modified File::List->new is shown below (my mods in column zero):

sub new { my $class = shift; my $base = shift; my $self = {}; bless $self, $class; # store my base for later $self->{base} = $base; $debug && print "spawned with base [$base]\n"; # read in contents of current directory opendir (BASE, $base); my @entries = grep !/^\.\.?\z/, readdir BASE; chomp(@entries); closedir(BASE); for my $entry (@entries) { print "DEBUG: processing '$base/$entry'\n"; # if entry is a directory, launch a new File::List to explore it # and store a reference to the new object in the dirlist hash if (-d "$base/$entry") { $debug && print _trace(),"following directory $base/$entry\n"; my $newbase = new File::List("$base/$entry"); $self->{dirlist}{ $entry } = $newbase; } # if entry is a file, store it's name in the dirlist hash elsif ( -f "$base/$entry"){ $debug && print _trace(),"Found file : $base/$entry\n"; $self->{dirlist}{ $entry } = 1; } else { print "DEBUG: '$base/$entry' is not -f nor -d\n"; } } return $self; }

The console output in command-line mode is:

DEBUG: processing '/tmp/zumlaut//Zatôichi 01 - The Tale of Zatoichi (1 +962)' DEBUG: processing '/tmp/zumlaut//Zatôichi 01 - The Tale of Zatoichi (1 +962)/folder.jpg' DEBUG: processing '/tmp/zumlaut//Zatôichi 01 - The Tale of Zatoichi (1 +962)/VIDEO_TS' DEBUG: processing '/tmp/zumlaut//Zatôichi 01 - The Tale of Zatoichi (1 +962)/VIDEO_TS/fanart.jpg' DEBUG: processing '/tmp/zumlaut//Zatôichi 01 - The Tale of Zatoichi (1 +962)/VIDEO_TS/VIDEO_TS.TAG' DEBUG: processing '/tmp/zumlaut//Zatôichi 01 - The Tale of Zatoichi (1 +962)/backdrop.jpg' DEBUG: processing '/tmp/zumlaut//Zatôichi 01 - The Tale of Zatoichi (1 +962)/mymovies.xml' DEBUG: processing '/tmp/zumlaut//Pan's Labyrinth (2006)' DEBUG: processing '/tmp/zumlaut//Pan's Labyrinth (2006)/folder.jpg' DEBUG: processing '/tmp/zumlaut//Pan's Labyrinth (2006)/BDMV' DEBUG: processing '/tmp/zumlaut//Pan's Labyrinth (2006)/BDMV/index.TAG +' DEBUG: processing '/tmp/zumlaut//Pan's Labyrinth (2006)/BDMV/fanart.jp +g' DEBUG: processing '/tmp/zumlaut//Pan's Labyrinth (2006)/backdrop.jpg' DEBUG: processing '/tmp/zumlaut//Pan's Labyrinth (2006)/mymovies.xml'
In Tk GUI mode we get:
DEBUG: processing '/tmp/zumlaut//Zatôichi 01 - The Tale of Zatoichi (1 +962)' DEBUG: '/tmp/zumlaut//Zatôichi 01 - The Tale of Zatoichi (1962)' is no +t -f nor -d DEBUG: processing '/tmp/zumlaut//Pan's Labyrinth (2006)' DEBUG: processing '/tmp/zumlaut//Pan's Labyrinth (2006)/folder.jpg' DEBUG: processing '/tmp/zumlaut//Pan's Labyrinth (2006)/BDMV' DEBUG: processing '/tmp/zumlaut//Pan's Labyrinth (2006)/BDMV/index.TAG +' DEBUG: processing '/tmp/zumlaut//Pan's Labyrinth (2006)/BDMV/fanart.jp +g' DEBUG: processing '/tmp/zumlaut//Pan's Labyrinth (2006)/backdrop.jpg' DEBUG: processing '/tmp/zumlaut//Pan's Labyrinth (2006)/mymovies.xml'

Note that the path with the utf-8 char in it is not recognized as a directory. Apparently *Something* introduced by the Tk environment is preventing the -d test from returning true.

I've tried a few things, but to no avail. Suggestions or helpful references MOST welcome!

Replies are listed 'Best First'.
Re: File::List, UTF-8, and Tk
by Anonymous Monk on Apr 28, 2011 at 03:20 UTC
Re: File::List, UTF-8, and Tk
by ron7 (Beadle) on Apr 28, 2011 at 04:06 UTC
    Ok, despite monks with vows of silence, I have a patch which "fixes" File::List->new when run under Tk (refer to earlier code snippet):
    use Encode; ... for my $entry (@entries) { $entry = decode('utf8', $entry); ...

    This works in both cases (command-line and Tk env), although the debug print of the filename is now screwed when displayed on the console (the utf8 char is now unicode, I think), not that that matters:

    DEBUG: processing '/tmp/zumlaut//Zat�ichi 01 - The Tale of Zato +ichi (1962)' ...

    I'm always reluctant to modify distribution modules and this fix feels very klugy. Suggestions please?

      I'm always reluctant to modify distribution modules and this fix feels very klugy. Suggestions please?

      1) subclass 2) inline

        Thanks, one of those will be the way to go--should have thought of it myself *<8-). I still have problems related to those paths and use of FBox/chooseDirectory/getOpenFile, but there seems enough material about this problem around that it may be solvable.

        BTW, should have mentioned in the original post that my command-line and Tk apps must run on Linux/Win32/OS-X, in any locale. Big ask, and so far and after a lot of work, they mostly do except for the file path problem under Tk.