emav has asked for the wisdom of the Perl Monks concerning the following question:

Hi, brothers!
I'm trying to run the following code on my Win XP Greek...
#!/usr/bin/perl -w use strict; use utf8; my $file = "C:\\&#913;&#957;&#964;&#943;&#947;&#961;&#945;&#966;&#959; +\\VOLINFO.TXT"; open FILE, $file or die "can't open $file: $!"; while (<FILE>) { } close FILE;
...but I'm getting the following error message:
can't open C:\&#913;&#957;&#964;&#943;&#947;&#961;&#945;&#966;&#959;\V +OLINFO.TXT: No such file or directory at C:\perl_scripts\open_file.pl + line 7.
Is there a way to circumvent this problem? PS: Of course, I'm not using html entities but the corresponding unicode characters.

Replies are listed 'Best First'.
Re: Can't Find File When Non-ASCII Letters Appear in Path
by ikegami (Patriarch) on Apr 16, 2007 at 19:46 UTC

    Windows provides two interface to its functions. One supports single byte characters (Called "ANSI" by MS), and one supports two-byte characters (Called "Wide" or "UNICODE" by MS, they are encoded as "UCS-2le"). For example, to create/open a file, one would call CreateFileA if we passed a single-byte char string, or CreateFileW if we passed a UCS-2le char string.

    I'm not sure how about all the details, but one thing in sure. If Perl is using CreateFileA (and I think it does), you have a problem.

    >perl -e"use Encode; print encode('UCS-2le', 'ABC')" | od -c 0000000 A \0 B \0 C \0 0000006

    Since CreateFileA accepts a NUL terminated file name, it will think encode('UCS-2le', 'ABC') is just 'A'. CreateFileW must be used to open that file. For the same reason, FindFirstFileW and FindNextFileW must be used list the contents of the directory.

    I don't know if there's anything "out there" that does what you need. Win32::API will definitely get you there, but you'll have to do some leg work.

      Oh, dear! I was hoping there would be an easier and dirtier solution to my problem than Win32::API but... a monk's got to do what a monk's got to do. ;-)

      I'm ready to dive. I'll let you know if this gets me anywhere.
Re: Can't Find File When Non-ASCII Letters Appear in Path
by emav (Pilgrim) on Apr 16, 2007 at 21:06 UTC
    Thankfully, it was much easier than I thought! Here's what solved my problem:
    #!/usr/bin/perl -w use strict; use Encode qw(from_to); use Encode::Byte; my $file = "C:\\&#913;&#957;&#964;&#943;&#947;&#961;&#945;&#966;&#959; +\\VOLINFO.TXT"; from_to($file, "utf8", "cp1253"); open FILE, $file or die "can't open $file: $!"; while (<FILE>) { print; } close FILE;
    I can't understand why it didn't work when I directly used this type of characters: \xc1\xed... etc. (I checked! The output was correct cp1253 characters... or wasn't it?)

    Oh well, I guess that's why some people write modules that work while I get stuck with encoding issues before even writing a line of code. ;-)

    Anyway, thanks to everybody who took the time to help me out.
Re: Can't Find File When Non-ASCII Letters Appear in Path
by samtregar (Abbot) on Apr 16, 2007 at 18:09 UTC
    PS: Of course, I'm not using html entities but the corresponding unicode characters.

    Perhaps your filesystem doesn't encode filenames using UTF-8. I'd try opening up the directory with opendir() and looking at what readdir() produces for the files in question. Most likely you'll have to produce the same thing to open the file.

    -sam

      Thanks for the reply, Sam!

      Well, perhaps I should have mentioned that I resorted to UTF-8 after having tried Windows-1253 and ISO-8859-7. I even created a Wx::FileDialog object to make sure I'm getting the right path/filename but to no avail.

      Of course, there is no problem when plain English letters only appear in the path.
        Did you try UTF-16? I have some vague recollection that it's a popular encoding for Windows filesystems. Wikipedia seems to agree: UTF-16.

        -sam

Re: Can't Find File When Non-ASCII Letters Appear in Path
by Anonymous Monk on Apr 17, 2007 at 13:23 UTC