ron7 has asked for the wisdom of the Perl Monks concerning the following question:

I have a 10K LOC open source application suite which uses Tk and builds to self-contained "compiled" applications on Windows*, Linux, and OSX using ActiveState Perl Dev Kit 1.9 and ActivePerl 5.8.1 (this cannot be changed as later AS Perl releases have no Tk). My users generally don't know Perl from a hole in the road, so this compile step is rather mandatory

It runs fine for English speaking countries, but is now starting to be used in France and Germany where extended char set chars in the file paths are causing difficulty (of the file not found variety, even though the file was selected from a file browser). It does not just use simple "open" statements; there are a lot of -f, -d etc, plus XML::Simple and other API's which take a filename as an argument (would love to use XML::LibXML instead, but AS 5.8 defeats me again because there is no distro available for OSX i386 and PPC).

For example, a path like "nfo_test/Un monde à nous (2007)" causes a -d to say it does not exist. Converting the name to a single byte string, works for -d, but not for -f (go figure). I'd like to internationalize the app to at least handle 'decorations' in French, German, and other Romance language countries (let's forget Chinese/Japanese, etc), though reading leads me to suspect this may not be possible. It already handles utf8 chars in text strings correctly. It's just file paths causing trouble. Suggestions most welcome. Just in case, the source is at http://itee.uq.edu.au/~chernich/tagsuite.html

Replies are listed 'Best First'.
Re: File path internationalization
by Jim (Curate) on Dec 26, 2010 at 07:28 UTC
Re: File path internationalization
by Marshall (Canon) on Dec 26, 2010 at 05:15 UTC
    ActiveState Perl Dev Kit 1.9 and ActivePerl 5.8.1 (this cannot be changed as later AS Perl releases have no Tk).
    I currently use Perl 5.10, Tk is fine. You have to use ppm to install it - it is available, it just doesn't come as part the default set of stuff. Some simple upgrade is probably not the answer to your language problem, but it is possible if needed for other reasons.

    Might want to try getting the "short path name"? For use in opening files, etc? Win32::GetShortPathName($fullpath);. I think there is a much better way to deal with filenames for the languages you list, but can't find it right now.

      AFAIK, a 5.10 ppd for Tk is not available for OSX (just like XML::LibXML not being available, but separate issue)--could be wrong, but as you say this is not going to fix my problem, and neither, I suspect, will Win32::Anything because of the platforms I need to support (leaving aside all the mess having conditional evals for other platforms would introduce, if similar features exist), but thanks for the thought.
Re: File path internationalization
by Anonymous Monk on Dec 26, 2010 at 04:49 UTC
      Second thread interesting and relevant as I use File::List and need it to return directories and files with extended chars in the name, which it does not "out of the box". This is sounding like there is no cross platform solution, but I'll keep plugging a while longer. I develop and test under Linux, but test under Win* and OSX. So far no joy under Linux and that should be the easy one.
        I have a lot of troubles with Tk and internationalization on Linux and Windows with Perl. In Windows I use StrawberryPerl.
        1. For Unix you must use Unicode and all must be normal, but for filenames you will have problems. Tk use as standard iso-8859 codepage(if I remember this number) for filenames. And if you need use UTF-8 then you must get problems. Filedialog from Tk work wrong with UTF-8 filenames. This was my bigest problem with Tk. It show in dialog utf-8 filenames as ???. I must create my own filedialog with UTF-8.
        2. All methods from Tk work with latin codepage and can't open utf-8 filenames(method Photo as example). For this methods I create subroutine for translate name to latinic filename and use in Photo latinic filename.
        3. In Windows filename codepage use ONLY national codepage. For example: for cyrylic filenames it must be cp1251(if Windows MUI is cyrylic)...
        4. For data in filenames I use utf-8 and you must keep in mind this and encode data if this need. Also Windows use CrLf ending of line. In Unix only Cr.

        Mostly my questions on PerlMonks was about codepages in Tk and others about internationalizations. You can see my profile and read all about this theme in my feeds...