in reply to Saving file name with Chinese characters

Unix interprets file names as series of characters in the current locale's encoding, often UTF-8. I don't know how to get the correct encoding, but you could look at how the open pragma does it. Encode's encode can be used to encode the file name once you know the encoding.

Windows uses UCS-2le internally. As such, it supports all Unicode characters in BMP0 (those up to U+FFFF). However, Perl doesn't use the system call capable of handling those characters. You'll need to create or open the file using Win32API::File's CreateFileW. CreateFileW expects you to encode the file name yourself (using UCS-2le).

Update: I wasn't sure if Windows used UCS-2le or UTF-16le, so I put it to the test. It won't let me create a file with U+10000 in its name, ruling out UTF-16le. I adjusted the above accordingly.

Replies are listed 'Best First'.
Re^2: Saving file name with Chinese characters
by Anonymous Monk on Apr 13, 2009 at 18:46 UTC
    Can you please show an example of using CreateFileW? I looked at the docs, but I am not too sure. Thanks.
      Look under CreateFile (note the case) for a detailed description of each arg.

      That leaves getting a Perl handle from the Win32API::File object:

      use strict; use warnings; use Encode qw( encode ); use Symbol qw( gensym ); use Win32API::File qw( CreateFileW OsFHandleOpen CREATE_ALWAYS GENERIC_WRITE ); my $qfn = chr(0x2660); # Whatever my $win32f = CreateFileW( encode('UCS-2le', $qfn), GENERIC_WRITE, # For writing 0, # Not shared [], # Security attributes CREATE_ALWAYS, # Create and replace 0, # Special flags [], # Permission template ) or die("CreateFile: $^E\n"); OsFHandleOpen( my $fh = gensym(), $win32f, 'w' ) or die("OsFHandleOpen: $^E\n"); print $fh "Foo!\n";
        ikegami,

        Back to the original question: Your solution shows how to save a file when you *know* the encoding for one character.

        I don't know what users will upload. So I need to convert the file names on the fly. In fact, they might not even use Chinese. They might use Korean, Japanese or just English. I need a solution that would be language independent.

        Years ago, I remember seeing some Perl program that would use the bytes pragma to do the conversion. Would that help?

        Thanks ikegami.

Re^2: Saving file name with Chinese characters
by Anonymous Monk on Apr 13, 2009 at 23:56 UTC
    However, Perl doesn't use the system call capable of handling those characters.

    I've tried it by open, file could be created successfully. I guess server don't installed East Asian language package.

      Not quite. A file is created, but it doesn't have the right name. For example, If you use CreateFileW, you can actually create a file whose name is the single character "人" ("person"). If you try to create such a file with open, it'll create the three char file name "人".