IB2017 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks

As always in my relationship to Perl and Windows, I am having problems with non Latin characters in file names/directories. In this case I am selecting an Excel file in Tk with getOpenFile. This seems to work fine, as the script demonstrate. On my machine (Windows 10, locale German), I can open Excel files with file names and in directories with non Latin characters (Chinese and so on). The selected path is shown correctly in my Tk app. However, I can not parse it as it fails with Tk::Error: Can't call method "read" on an undefined value at C:/Perl/site/lib/Spreadsheet/ line 79. which means something is wrong with the path. As you can see, I try to use also Win32::LongPath: in some case it gives back a long path which is then fine for Spreadsheet::ParseXLSX, in others it gives back the original, which Spreadsheet::ParseXLSX can not open. Note that the script has to run on any desktop machine having all different locale settings.

use strict; use warnings; use Tk; use Win32::LongPath; use Spreadsheet::ParseXLSX; my $mw = Tk::MainWindow->new(); my $path; my $button = $mw->Button( -text => "Select a file", -command => \&show_file_dialog, )->pack(-side => 'left',); my $label = $mw->Label( -text => 'No file yet', )->pack(-side => 'left',); $mw->MainLoop(); sub show_file_dialog { my @ext = ( ["Excel", ['xlsx']], ["All files", ['*']], ); $path = $mw->getOpenFile( -filetypes => \@ext, ); #use shortpath my $ShortPath = shortpathL ($path); #print path read and path converted with short path inside Tk $label->configure(-text => "$path - $ShortPath"); print "try to open file with shortpathL path\n"; my $parser = Spreadsheet::ParseXLSX->new(); my $workbook = $parser->parse($ShortPath); print "try to open file with original path\n"; my $parser = Spreadsheet::ParseXLSX->new(); my $workbook = $parser->parse($path); }

What can I do to achieve my goal given that shortpathL returns only sometimes a nice path? (for example it dowsn't seem to work if the file is on a removable device, or in some positions on the hard drive, etc.). Win32::Unicode is not an option as it is not maintained and compilation fails on any modern Perl. Thank you for your suggestions.

Replies are listed 'Best First'.
Re: Spreadsheet::ParseXLSX filename non Latin Tk getOpenFile
by swl (Parson) on Nov 21, 2018 at 08:31 UTC

    Given what you're tried I assume you're familiar with the Unicode Bug. Others can provide better summaries of that than I can.

    More generally, though, do the shortpaths themselves contain unicode characters or something that will be treated differently between Tk, perl and the OS?

    Another option is to pass Spreadsheet::ParseXLSX::parse a file handle. It checks if the file argument is a file handle and acts accordingly, so if you can open the file using Win32::LongPath::openL then that will avoid the standard open call (assuming that is the cause of the error).

      I think the problem with shortpath is that the module - for me quite mysteriously - sometimes does provide a short path, sometime no. Moving around the same file (or directory structure), the module may start to provide the shortpath (otherwise it returns the original path). As far as I understood, this must depend on some quite wired Windows stuff. When a shortpath is provided, Spreadsheet::ParseXLSX works smoothly. However, it is unsatisfying if it doesn't work consistently.

      If my path is


      The above code selects the path okay, and displays it fine in the UI (at least with the latest Tk). Passing this path to Spreadsheet::ParseXLSX does NOT work. Shouldn't Perl and Tk resolve this internally (no sure about the layer to the OS)? If a shortpath is returned

      then is okay. If it is not return, I simply get again
      which of course doesn't work.

      Can you elaborate more the idea to pass a file handle? Because, if the problem is not being able - in some circumstances - to get the shortpath, I guess I cannot have a filehandle too to pass, am I a wrong?

      I also tried to eliminate the Tk OpenFile and directly pass the path inside the script with:

      use utf8;
      my $workbook = $parser->parse('C:/Users/DE/Desktop/号召力打了/Ршзефф.xlsx');

      This doesn't work too.

        My thinking is that could could modify your code to something like (untested):

        # assumes $filename is set from Tk my $fh; Win32::LongPath::openL (\$fh, '<', $filename) or die "Unable to open $filename"; my $parser = Spreadsheet::ParseXLSX->new(); my $workbook = $parser->parse($fh); # do stuff

        openL is documented at,MODE,PATH

        You could also check the filename can be found using testL.,PATH

        This all comes with the caveat that I have not used these functions, although if it does work then it will help solve one of my own longstanding headaches with a Gtk2 application.

Re: Spreadsheet::ParseXLSX filename non Latin Tk getOpenFile
by vr (Curate) on Nov 21, 2018 at 14:31 UTC

    I wouldn't rely on short names "on any desktop machine" -- 8.3 names could have easily been disabled on some of them. Besides, underlying MS API may be buggy, though I didn't investigate thoroughly. It seems that short name is not looked up (read from directory) if long name is itself 8.3 and uses CP1252 (? - not sure) characters.

    E.g. my windows CP is not 1252 and doesn't contain "ñ" at all, therefore, for directory "Buñuel", "short" (actually, it's longer) name "BUUEL~1" is generated and must be supplied to non-complying software, such as Perl :).

    D:\>mkdir Buñuel
    D:\>dir /x
    # skipped
    21/11/2018  17:01    <DIR>          BUUEL~1      Buñuel
    # skipped

    Then I put "x.xlsx" into this directory, and:

    use strict;
    use warnings;
    use feature 'say';
    use utf8;
    use Spreadsheet::ParseXLSX;
    use Win32::API;
    use Win32::LongPath;
    use Encode qw/ encode decode /;
    use Test::More;
    Win32::API::More-> Import( 'kernel32', 'GetShortPathNameW', 'NPN', 'N' ) or die;
    my $dir = 'D:\Buñuel';
    my $tmp = encode 'UTF16LE', "$dir\0";
    my $ptr = unpack 'L', pack 'p', $tmp;
    my $buf = ' ' x 100;
    my $len = GetShortPathNameW( $ptr, $buf, 100 ) or die;
    my $short = substr +( decode 'UTF16LE', $buf ), 0, $len;
    is $dir, $short,                "but they should not be the same!";
    is shortpathL( $dir ), $short,  "because Win32 API is used anyway";
    my $parser   = Spreadsheet::ParseXLSX-> new;
    #my $workbook = $parser-> parse( shortpathL( "$dir/x.xlsx" ));   # will die here
    my $workbook = $parser-> parse( 'D:\BUUEL~1/x.xlsx' );          # go on living
    ok 1 - but they should not be the same!
    ok 2 - because Win32 API is used anyway