lustyx has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, I'm unable to open a file using Perl on a Windows machine where the path to the file has UTF-8 characters in it. My script looks like this:
my $strPath = "C:\\server\\htdocs\\DEVELOPMENT\\testing\\" . "&#19968; +&#26869;&#39640;&#26641;" . "\\test\\test.ssi"; open DATFILE, '+<' . $strPath or die("Can't open":" . $!); my $strHeaderLine = <DATFILE>; print $strHeaderLine; close DATFILE;
Everytime I try I get: Can't open:No such file or directory I've tried both 5.6 and 5.8. Any help would be great. Thanks, Justin

Replies are listed 'Best First'.
Re: Problems Opening file with Perl where path has UTF8
by ikegami (Patriarch) on Nov 23, 2006 at 00:13 UTC

    Note: I'm assuming you have the actual characters 一棵高树 in there instead of &#19968;&#26869;&#39640;&#26641;

    Try

    use Encode qw( decode ); my $strPath = decode("C:\\server\\htdocs\\DEVELOPMENT\\testing\\...\\t +est\\test.ssi");

    or

    use utf8; my $strPath = "C:\\server\\htdocs\\DEVELOPMENT\\testing\\...\\test\\te +st.ssi";

    (Use the characters instead of ...)

    They should both work or both fail, probably the latter. If it works, I'll explain why.

      Well after a few days of research here is what I came up with. Thanks for all of your help and pointing me in the right direction. The main point that I learned was to get unicode to work in filenames and paths you need to get platform specific. It would be great if that was not the case. Here is the code that I got working. It is running on Windows XP with Perl 5.8.8 (from Active State). Working script:

      #!/usr/bin/perl use strict; use Win32API::File qw( :ALL ); use utf8; use Encode; #Notice the \0 at the end of the file name. Necessary but I don't kno +w why. my $win32_handle = Win32API::File::CreateFileW(Encode::encode("UTF-16L +E", "C:\\test\\" . "
      一棵高树
      " . "\\test.txt\0"), FILE_READ_DATA, 0, [], OPEN_EXISTING, 0, []); my $perl_handle = 0; #This translates the win32 file handle to a Perl file handle Win32API::File::OsFHandleOpen($perl_handle, $win32_handle, "r"); print <$perl_handle>; if ($^E) { print "Error: " . $^E }
        You shouldn't rely on $^E to tell you there's been an error. You should only use $^E when you know an error has occured.

        The nul is necessary for the system to know where the string ends. Perl will automatically add an ASCII nul (1 byte) when converting from an SV* to a char*, but an UTF16LE nul (2 bytes) is needed here.

Re: Problems Opening file with Perl where path has UTF8 (sorry)
by tye (Sage) on Nov 23, 2006 at 00:12 UTC

    Dang. I need to add CreateFileW() to Win32API::File... ): I thought I had but it was Win32API::Registry that included the *W hooks...

    Update: Short of that, I think you are stuck with only 8-bit characters when using open. So your options are likely:

    1. Switch to a locale where the file name only requires 8-bit characters
    2. Use the "short name" to open the file (at least I think that will work around this problem
    3. Rename the file (or, under NTFS, provide a hard link to the file using a name w/o problematic characters)

    It is too bad that Win32.pm includes GetShortPathName() [which is GetShortPathNameA()] but not GetShortPathNameW().

    Update: Yep, I checked the latest Perl source code and open under Win32 hasn't been made smart enough to handle UTF-8 strings.

    ikegami mentioned (in the CB) using Win32::API to get access to CreateFileW(), which you could combine with Win32API::File's OsFHandleOpen() to get a Perl file handle to use.

    - tye        

      Um, actually, Win32API::File does have CreateFileW(), just not mentioned in the documentation.

      my $wPath= pack "S*", unpack( "C*", "C:/server/htdocs/DEVELOPMENT/testing/" ), 19968, 26869, 39640, 26641, unpack( "C*", "/test/test.ssi" ), 0; my $hFile= CreateFileW( $wPath, GENERIC_READ()|GENERIC_WRITE(), FILE_SHARE_READ()|FILE_SHARE_WRITE(), [], OPEN_EXISTING(), 0, [] ) or die "Can't open: $^E\n"; OsFHandleOpen( *DATFILE, $hFile, "rw" ) or die "Can't create Perl handle: $!\n"; my $strHeaderLine = <DATFILE>; print $strHeaderLine; close DATFILE;

      Note, this code is untested because I composed it under Linux.

      Update: I need to update createFile() to detect UTF-8 path names and do all this for you...

      - tye        

        Update: The striken text is only true for the system call CreateFileW, but not for Win32API::File's verion of CreateFileW.

        CreateFileW returns INVALID_FILE_HANDLE (-1) on error, not false.

        use constant INVALID_FILE_HANDLE => -1; my $hFile= CreateFileW( $wPath, GENERIC_READ()|GENERIC_WRITE(), FILE_SHARE_READ()|FILE_SHARE_WRITE(), [], OPEN_EXISTING(), 0, [] ); INVALID_FILE_HANDLE != $hFile or die "Can't open: $^E\n";
      What about FindFirstFileW() (also not in Win32API::File) unfortunately, for opening directories in UTF-8? In perl's win32/win32.c - win32_open has deprecated USING_WIDE(), so there's no real option at this point. :( I'd really like to see File::Find work on Win32 with UTF-8 filenames & dirs. Thanks

        FindFirstFileA() isn't even in Win32API::File. Unfortunately, the complex structs used makes it less trivial to add; but I'll look into that. But that wouldn't help File::Find much. Better would be to fix readdir (and thus glob, I'd hope) on Win32 to use FindFirstFileW() and return UTF-8 file names only if the found files have names with non-8-bit characters in them (or perhaps non-7-bit characters), or perhaps do things backward compatably based on something in %ENV (well, I prefer using %ENV over some obscure internal Perl variable because I find such preferences easier to deploy).

        - tye