Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Spreadsheet::ParseXLSX filename non Latin Tk getOpenFile

by IB2017 (Pilgrim)
on Nov 21, 2018 at 05:29 UTC ( [id://1226103]=perlquestion: print w/replies, xml ) Need Help??

IB2017 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks

As always in my relationship to Perl and Windows, I am having problems with non Latin characters in file names/directories. In this case I am selecting an Excel file in Tk with getOpenFile. This seems to work fine, as the script demonstrate. On my machine (Windows 10, locale German), I can open Excel files with file names and in directories with non Latin characters (Chinese and so on). The selected path is shown correctly in my Tk app. However, I can not parse it as it fails with Tk::Error: Can't call method "read" on an undefined value at C:/Perl/site/lib/Spreadsheet/ParseXLSX.pm line 79. which means something is wrong with the path. As you can see, I try to use also Win32::LongPath: in some case it gives back a long path which is then fine for Spreadsheet::ParseXLSX, in others it gives back the original, which Spreadsheet::ParseXLSX can not open. Note that the script has to run on any desktop machine having all different locale settings.

use strict; use warnings; use Tk; use Win32::LongPath; use Spreadsheet::ParseXLSX; my $mw = Tk::MainWindow->new(); my $path; my $button = $mw->Button( -text => "Select a file", -command => \&show_file_dialog, )->pack(-side => 'left',); my $label = $mw->Label( -text => 'No file yet', )->pack(-side => 'left',); $mw->MainLoop(); sub show_file_dialog { my @ext = ( ["Excel", ['xlsx']], ["All files", ['*']], ); $path = $mw->getOpenFile( -filetypes => \@ext, ); #use shortpath my $ShortPath = shortpathL ($path); #print path read and path converted with short path inside Tk $label->configure(-text => "$path - $ShortPath"); print "try to open file with shortpathL path\n"; my $parser = Spreadsheet::ParseXLSX->new(); my $workbook = $parser->parse($ShortPath); print "try to open file with original path\n"; my $parser = Spreadsheet::ParseXLSX->new(); my $workbook = $parser->parse($path); }

What can I do to achieve my goal given that shortpathL returns only sometimes a nice path? (for example it dowsn't seem to work if the file is on a removable device, or in some positions on the hard drive, etc.). Win32::Unicode is not an option as it is not maintained and compilation fails on any modern Perl. Thank you for your suggestions.

Replies are listed 'Best First'.
Re: Spreadsheet::ParseXLSX filename non Latin Tk getOpenFile
by swl (Parson) on Nov 21, 2018 at 08:31 UTC

    Given what you're tried I assume you're familiar with the Unicode Bug. Others can provide better summaries of that than I can.

    More generally, though, do the shortpaths themselves contain unicode characters or something that will be treated differently between Tk, perl and the OS?

    Another option is to pass Spreadsheet::ParseXLSX::parse a file handle. It checks if the file argument is a file handle and acts accordingly, so if you can open the file using Win32::LongPath::openL then that will avoid the standard open call (assuming that is the cause of the error). https://github.com/doy/spreadsheet-parsexlsx/blob/master/lib/Spreadsheet/ParseXLSX.pm#L81.

      I think the problem with shortpath is that the module - for me quite mysteriously - sometimes does provide a short path, sometime no. Moving around the same file (or directory structure), the module may start to provide the shortpath (otherwise it returns the original path). As far as I understood, this must depend on some quite wired Windows stuff. When a shortpath is provided, Spreadsheet::ParseXLSX works smoothly. However, it is unsatisfying if it doesn't work consistently.

      If my path is

      C:/Users/DE/Desktop/号召力打了/Ршзефф.xlsx

      The above code selects the path okay, and displays it fine in the UI (at least with the latest Tk). Passing this path to Spreadsheet::ParseXLSX does NOT work. Shouldn't Perl and Tk resolve this internally (no sure about the layer to the OS)? If a shortpath is returned

       C:\Users\FC\Desktop\7373~1\B030~1.XLS
      then is okay. If it is not return, I simply get again
      C:/Users/DE/Desktop/号召力打了/Ршзефф.xlsx
      which of course doesn't work.

      Can you elaborate more the idea to pass a file handle? Because, if the problem is not being able - in some circumstances - to get the shortpath, I guess I cannot have a filehandle too to pass, am I a wrong?

      I also tried to eliminate the Tk OpenFile and directly pass the path inside the script with:

      use utf8;
      my $workbook = $parser->parse('C:/Users/DE/Desktop/号召力打了/Ршзефф.xlsx');

      This doesn't work too.

        My thinking is that could could modify your code to something like (untested):

        # assumes $filename is set from Tk my $fh; Win32::LongPath::openL (\$fh, '<', $filename) or die "Unable to open $filename"; my $parser = Spreadsheet::ParseXLSX->new(); my $workbook = $parser->parse($fh); # do stuff

        openL is documented at https://metacpan.org/pod/Win32::LongPath#openL-FILEHANDLEREF,MODE,PATH

        You could also check the filename can be found using testL. https://metacpan.org/pod/Win32::LongPath#testL-TYPE,PATH

        This all comes with the caveat that I have not used these functions, although if it does work then it will help solve one of my own longstanding headaches with a Gtk2 application.

Re: Spreadsheet::ParseXLSX filename non Latin Tk getOpenFile
by vr (Curate) on Nov 21, 2018 at 14:31 UTC

    I wouldn't rely on short names "on any desktop machine" -- 8.3 names could have easily been disabled on some of them. Besides, underlying MS API may be buggy, though I didn't investigate thoroughly. It seems that short name is not looked up (read from directory) if long name is itself 8.3 and uses CP1252 (? - not sure) characters.

    E.g. my windows CP is not 1252 and doesn't contain "ñ" at all, therefore, for directory "Buñuel", "short" (actually, it's longer) name "BUUEL~1" is generated and must be supplied to non-complying software, such as Perl :).

    D:\>mkdir Buñuel
    
    D:\>dir /x
    # skipped
    
    21/11/2018  17:01    <DIR>          BUUEL~1      Buñuel
    
    # skipped
    

    Then I put "x.xlsx" into this directory, and:

    use strict;
    use warnings;
    use feature 'say';
    use utf8;
    use Spreadsheet::ParseXLSX;
    use Win32::API;
    use Win32::LongPath;
    use Encode qw/ encode decode /;
    use Test::More;
    
    Win32::API::More-> Import( 'kernel32', 'GetShortPathNameW', 'NPN', 'N' ) or die;
    
    my $dir = 'D:\Buñuel';
    my $tmp = encode 'UTF16LE', "$dir\0";
    my $ptr = unpack 'L', pack 'p', $tmp;
    my $buf = ' ' x 100;
    my $len = GetShortPathNameW( $ptr, $buf, 100 ) or die;
    
    my $short = substr +( decode 'UTF16LE', $buf ), 0, $len;
    
    is $dir, $short,                "but they should not be the same!";
    is shortpathL( $dir ), $short,  "because Win32 API is used anyway";
    
    done_testing;
    
    my $parser   = Spreadsheet::ParseXLSX-> new;
    #my $workbook = $parser-> parse( shortpathL( "$dir/x.xlsx" ));   # will die here
    my $workbook = $parser-> parse( 'D:\BUUEL~1/x.xlsx' );          # go on living
    
    __END__
    
    D:\>perl p.pl
    ok 1 - but they should not be the same!
    ok 2 - because Win32 API is used anyway
    1..2
    

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1226103]
Approved by cavac
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (6)
As of 2024-03-28 21:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found