in reply to File::List, UTF-8, and Tk

Ok, despite monks with vows of silence, I have a patch which "fixes" File::List->new when run under Tk (refer to earlier code snippet):
use Encode; ... for my $entry (@entries) { $entry = decode('utf8', $entry); ...

This works in both cases (command-line and Tk env), although the debug print of the filename is now screwed when displayed on the console (the utf8 char is now unicode, I think), not that that matters:

DEBUG: processing '/tmp/zumlaut//Zat�ichi 01 - The Tale of Zato +ichi (1962)' ...

I'm always reluctant to modify distribution modules and this fix feels very klugy. Suggestions please?

Replies are listed 'Best First'.
Re^2: File::List, UTF-8, and Tk
by Anonymous Monk on Apr 28, 2011 at 04:14 UTC
    I'm always reluctant to modify distribution modules and this fix feels very klugy. Suggestions please?

    1) subclass 2) inline

      Thanks, one of those will be the way to go--should have thought of it myself *<8-). I still have problems related to those paths and use of FBox/chooseDirectory/getOpenFile, but there seems enough material about this problem around that it may be solvable.

      BTW, should have mentioned in the original post that my command-line and Tk apps must run on Linux/Win32/OS-X, in any locale. Big ask, and so far and after a lot of work, they mostly do except for the file path problem under Tk.

        Update: the decode utf8 fix works for Linux and OS-X, but not win32 (surprise, surprise).

        The problem is as before: the file path with extended chars in it is not matched by either of the -f or -d tests. Interestingly, console debug print complains about "wide" chars suggesting to me that the decode of the filepath HAS done its job. Printing the before/after state of the utf8 flag using Encode::is_utf8() seems to confirm this.

        Adding the decode CHECK option causes a runtime error under win32 (utf8 "\xF4" does not map to Unicode at...) Snooping the string shows it contains "c3 B4" for the "smaller letter 'o' with circumflex" char under Linux and "F4" (the unicode equivalent) under win32. Using the "strict" encoding alias "UTF-8" makes no difference (Linux==good; win32=BAD).
        $entry = decode('utf8', $entry, 1);

        I'm using ActiveState perl 5.8.9 so I can generate binaries for all platforms; Encode is at 2.42 in all cases.

        Followup! The "fix" for win32, if you can call it that, is MONSTROUS. Before each and every use of a file test (-f, -d, -e, etc), and each use of an open, move, copy etc, the following is required:

        $filename = pack 'U0C*', unpack 'C*', filename;

        This fixes runs under win32 and does no great harm under Linux/OS-X unless the name is going to be displayed in which case the original form must be used or the extended char will be shown as an undisplayable char.

        There HAS to be a better way!