in reply to Re: UTF-8 and readdir, etc.
in thread UTF-8 and readdir, etc.

Some notes: Will this work if $dir has at least one character >= 128? I think it needs 'use utf8;' and encoding it to the filesystem encoding before opendir call. Will this work when all three encodings (filesystem, result file, code source file) aren't the same (Windows for example)? I doubt it

Replies are listed 'Best First'.
Re^3: UTF-8 and readdir, etc.
by kcott (Archbishop) on Feb 02, 2018 at 01:22 UTC
    "Will this work if $dir has at least one character >= 128?"

    If "128" refers to the return value of "ord($character)" (see ord), my test data uses such characters. If you meant something else, please explain.

    "I think it needs 'use utf8;'"

    The source code I provided is written entirely using 7-bit ASCII characters. The utf8 pragma is definitely not required here. I suggest you read that documentation, paying particular attention to this part (which it shows in bold text):

    "Do not use this pragma for anything else than telling Perl that your script is written in UTF-8."
    "Will this work when all three encodings (filesystem, result file, code source file) aren't the same (Windows for example)?"

    The OP stated that "The host OS is Linux, and is configured to use UTF-8 for filenames; the contents of the output file are also encoded as UTF-8.".

    — Ken

      What character in

      my $dir = 'pm_1208191_utf8_filenames';

      has ord() >= 128?

      I meant 'use utf8;' needed if there is an actual char with ord() >= 128 in $dir string.

      The reason behind my post was that your suggestion isn't a valid unicode processing. It cover only one specific case where encodings of fs/result file/code are the same. Just that case I wanted to highlight that