in reply to Re^2: UTF-8 lexicographic string sort
in thread UTF-8 lexicographic string sort

Finding the correct encoding for the filesystem is up to you.

I'm not aware of any good way to find/know the encoding of the names in a filesystem, so you will have to apply your own knowledge there.

Replies are listed 'Best First'.
Re^4: UTF-8 lexicographic string sort
by rdiez (Acolyte) on Apr 23, 2020 at 14:15 UTC

    I do not understand why the encoding used by the filesystem is relevant. I assume that File::Find, and ultimately the Perl runtime, will abstract all that knowledge and give me a Perl string that my script can safely work with. It does not matter if the filesystem underneath is Windows NTFS and its encoding is UCS-2. The Perl string with the filename will certainly never have UCS-2 encoding.

      I think you misunderstand. Even though the API of File::Find returns "strings", these will not compare properly with other strings because you haven't told what encoding the strings from the filesystem are in.

      This encoding is not known to Perl, and it is also not always known to the OS.

      Your assumption of Perl encapsulating the filesystem API string types is wrong.

      I assume that File::Find, and ultimately the Perl runtime, will abstract all that knowledge and give me a Perl string that my script can safely work with. It does not matter if the filesystem underneath is Windows NTFS and its encoding is UCS-2. The Perl string with the filename will certainly never have UCS-2 encoding.

      See the links in my node here.