in reply to Re^3: UTF-8 lexicographic string sort
in thread UTF-8 lexicographic string sort

I do not understand why the encoding used by the filesystem is relevant. I assume that File::Find, and ultimately the Perl runtime, will abstract all that knowledge and give me a Perl string that my script can safely work with. It does not matter if the filesystem underneath is Windows NTFS and its encoding is UCS-2. The Perl string with the filename will certainly never have UCS-2 encoding.

Replies are listed 'Best First'.
Re^5: UTF-8 lexicographic string sort
by Corion (Patriarch) on Apr 23, 2020 at 14:22 UTC

    I think you misunderstand. Even though the API of File::Find returns "strings", these will not compare properly with other strings because you haven't told what encoding the strings from the filesystem are in.

    This encoding is not known to Perl, and it is also not always known to the OS.

    Your assumption of Perl encapsulating the filesystem API string types is wrong.

Re^5: UTF-8 lexicographic string sort
by haukex (Archbishop) on Apr 23, 2020 at 15:28 UTC
    I assume that File::Find, and ultimately the Perl runtime, will abstract all that knowledge and give me a Perl string that my script can safely work with. It does not matter if the filesystem underneath is Windows NTFS and its encoding is UCS-2. The Perl string with the filename will certainly never have UCS-2 encoding.

    See the links in my node here.