in reply to Re^5: Writing UTF8 Filename (Win32)
in thread Writing UTF8 Filename

Wow. That would suck, IMHO. Talk about a complicated mess of an over-designed system.

Simply supporting Unicode strings as file names/paths is what should be done and is what was done in Win32. Perl doesn't support strings in multiple encodings (they are either Unicode in UTF-8 or aren't, when they are instead composed of 8-bit characters). Similarly, Win32 strings are either Unicode in UTF-16 (or so) or aren't, when they are composed of 8-bit characters. Win32 at least makes clear what the "aren't" case means; it means the string is in the encoding of the process's current locality (not in some encoding based on what part of the file system it is referring to, which would be an unholy mess).

The support for Win32 would be fairly simple, instead of always converting to 8-bit character strings before calling a Windows *A() function (which then converts them to UTF-16), we should always convert to UTF-16 strings before calling a Windows *W() function.

If Unix support for Unicode filenames is going a route similar to what you outlined, then I won't hold my breath for that being stable and don't think Perl should try to implement support for it, because I predict that route would be doomed to be abandoned anyway.

- tye        

Replies are listed 'Best First'.
Re^7: Writing UTF8 Filename (Win32)
by Juerd (Abbot) on Nov 18, 2007 at 01:13 UTC

    Wow. That would suck, IMHO. Talk about a complicated mess of an over-designed system.

    I don't think it would suck or is over designed. In the common case, you would use file functions like you do now, and Perl handles everything transparently. ${^FS_ENCODING} would default to auto, resulting in autodetection for the entire system. When you want to port your latin1 mp3 collection to utf8 (to name one real world case), it would be exceptionally easy to do so: given proper OS support it would detect the encodings automatically, and without the OS support you can still override them with two lines of code.

    The problem with ANY win32-only code, or any-platform-only code, is that you put the burden of writing portable applications on the programmer. Hence, some abstraction would be nice. If only perl provided useful hooks for encoding filenames in general, that would be a great start, and also provide nice ways of dealing with existing systems. Program written to support only absolute filenames? Hack hack, and it does what you want.

    Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

      Please put your complex plan into a module and not into Perl itself.

      The problem with ANY win32-only code, or any-platform-only code, is that you put the burden of writing portable applications on the programmer. Hence, some abstraction would be nice.

      The abstraction is that file names/paths that are Unicode (UTF-8) strings in Perl should be supported. That'll also be the abstraction that gets supported at the low level when Unix catches up to Win32, surely. That's the only abstraction that makes much sense. Sure, in the short term, there will be awkward steps to try to bridge between 8-bit chars and Unicode, but those are going to be awkard and non-portable and best kept out of the way of those who don't get stuck having to use them.

      - tye        

        Please put your complex plan into a module and not into Perl itself.

        But of course! If it can be done in a module, it usually should.

        If all filename using things can be overridden, taking over global functions could suffice.

        Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }