At the begining of the document it says:
Windows stores filenames in Unicode, encoded in UTF16
That's not completely right. NTFS (as most Unix/Linux file-systems) is encoding-agnostic. It just see filenames as arrays of wchar_t integers that are not required in any way to be valid UTF-16 sequences.
For most C/C++ applications that can handle wchar_t data directly this is a non issue, but for Perl it is because those file names which are not valid UTF-16 are not convertible to UTF-8 and modules like Win32::Unicode that do that conversion internally will fail on them.
Admittedly, for most scripts this is not an issue as no sane application creates (or lets the user create) files with names that are not valid UTF-16. But still malicious or just buggy software may do it.
Update: Well, NTFS is not completely encoding-agnostic because it is case-insensitive. It has the metadata file $UpCase that defines how wchar_t characters are converted to upper case. |