Well, Win32 has had stable support for Unicode filenames for many years. Perl's support for that is woeful but I'm working on that. I'd make a joke about "Real" operating systems, but it seems the Unix mongers may need a break from that in order to recover their sense of humor. (:
| [reply] |
One problem, even with Win32, is that you can have multiple filesystems on a single system, even within a single tree. Not every filesystem handles filenames the same way. Any solution for Perl would be incomplete without the possibility to override the encoding decision per path.
I'm hoping for a solution that is sufficiently abstracted that all platforms can use it. Win32's implementation would probably be a bit easier than one for, say, Linux, but even if you have to set things explicitly per path, it's better than what we have now. The following is copied from a post to p5p a while ago.
I tend to agree, however pragmas tend to be global, program- or packagewise, and what suits best here is individual, perl-call flag.
Global is a problem in most cases, but I feel it would be perfect here,
simply because the filesystem is equally global. In fact, it's even
longer lived than your Perl program :)
Better yet, global variables can be localized to dynamic scope. This is
good, because when you set the encoding for /foo, it should work for
encoding-unaware modules too.
Maybe a hash would be nice:
${^FS_ENCODING}{foo} = 'A';
${^FS_ENCODING}{foo}{bar} = 'B';
${^FS_ENCODING}{foo}{bar}{baz}{quux} = 'auto';
open my $fh, ">", "/foo/bar/baz/quux/blah/hello.txt";
Which then actually does:
open my $fh, ">", join("/",
""
encode(detect_encoding("/"), "foo"),
encode("A", "bar"),
encode("B", "baz"),
encode("B", "quux"),
encode(detect_encoding("/foo/bar/baz/quux"), "blah"),
encode(detect_encoding("/foo/bar/baz/quux/blah"), "hello.txt")
+,
);
| [reply] [d/l] [select] |
Wow. That would suck, IMHO. Talk about a complicated mess of an over-designed system.
Simply supporting Unicode strings as file names/paths is what should be done and is what was done in Win32. Perl doesn't support strings in multiple encodings (they are either Unicode in UTF-8 or aren't, when they are instead composed of 8-bit characters). Similarly, Win32 strings are either Unicode in UTF-16 (or so) or aren't, when they are composed of 8-bit characters. Win32 at least makes clear what the "aren't" case means; it means the string is in the encoding of the process's current locality (not in some encoding based on what part of the file system it is referring to, which would be an unholy mess).
The support for Win32 would be fairly simple, instead of always converting to 8-bit character strings before calling a Windows *A() function (which then converts them to UTF-16), we should always convert to UTF-16 strings before calling a Windows *W() function.
If Unix support for Unicode filenames is going a route similar to what you outlined, then I won't hold my breath for that being stable and don't think Perl should try to implement support for it, because I predict that route would be doomed to be abandoned anyway.
| [reply] |