in reply to Seeking help for copying recursive folders having some folder/file names in Chinese or japanese

why this happens and what to do http://www.i-programmer.info/programming/other-languages/1973-unicode-issues-in-perl.html
  • Comment on Re: Seeking help for copying recursive folders having some folder/file names in Chinese or japanese

Replies are listed 'Best First'.
Re^2: Seeking help for copying recursive folders having some folder/file names in Chinese or japanese
by salva (Canon) on Jan 20, 2015 at 11:04 UTC
    At the begining of the document it says:
    Windows stores filenames in Unicode, encoded in UTF16

    That's not completely right. NTFS (as most Unix/Linux file-systems) is encoding-agnostic. It just see filenames as arrays of wchar_t integers that are not required in any way to be valid UTF-16 sequences.

    For most C/C++ applications that can handle wchar_t data directly this is a non issue, but for Perl it is because those file names which are not valid UTF-16 are not convertible to UTF-8 and modules like Win32::Unicode that do that conversion internally will fail on them.

    Admittedly, for most scripts this is not an issue as no sane application creates (or lets the user create) files with names that are not valid UTF-16. But still malicious or just buggy software may do it.

    Update: Well, NTFS is not completely encoding-agnostic because it is case-insensitive. It has the metadata file $UpCase that defines how wchar_t characters are converted to upper case.

Re^2: Seeking help for copying recursive folders having some folder/file names in Chinese or japanese
by Anonymous Monk on Jan 19, 2015 at 22:42 UTC

    why this happens and what to do

    Win32::Unicode is much more convenient than Win32::OLE