in reply to Re: MD5 non ascii file name
in thread MD5 non ascii file name

Perl assumes Latin1 (for Win32) or "native" (for other) for all filenames. Under Win32, Perl mostly calls the *A APIs, which deal with "ASCII" data. In theory, Perl should move to using the *W APIs so it use UTF-16LE for filenames and all strings passed to the OS, but it doesn't. There is no abstraction layer for handling the encoding(s) returned by readdir and for the encoding(s) passed to open. They are not necessarily compatible with each other and not necessarily compatible with other strings in Perl.

Replies are listed 'Best First'.
Re^3: MD5 non ascii file name
by ikegami (Patriarch) on Aug 22, 2008 at 00:58 UTC

    Perl assumes Latin1 (for Win32) or "native" (for other) for all filenames.

    Do you have an example of where Perl treats file names as anything but opaque binary strings? Is that what you mean by "native"?

    If anything, Perl (such as File::Spec) treats file names as any other (undecoded) text string: as iso-latin-1, regardless of platform.

      I think the problem occurrs when you do stuff like:

      use utf8; my $filename = "Söme Weird File"; open my $fh, "<", $filename or die;

      Except that "ö" is likely still a valid character. The same happens with filenames read from an external file I guess.

        But doesn't that error occur for both Windows and "other" systems? On both WinXP and linux, the following code generates the same 2-byte file name (interpreted according to the local codepage).

        open(my $fh, '>', chr(0x2660)) or die $!

        (Too lazy to find out how to encode ö.)

        So it appears that Perl passes the string's internal buffer to the system call. I don't see how that can be used to demonstrate that Perl assumes an encoding for the file name. Quite the opposite, it seems to show that Perl assumes file names are binary strings. It's up to the user to encode them.