in reply to Matching non-ASCII file contents with file name.

By having use utf8; in your code, you only tell Perl that your source code is in UTF-8 (so the inverted question mark gets recognized as that), not what the input and output should be encoded in. Perl knows that your output handle is (say) Latin-1 and as it can convert the Unicode string it read from the UTF-8 to Latin-1 it does so when printing.

I find the approach of explicitly specifying the encodings for filenames the easiest way to get consistent results:

#!perl use strict; use warnings; use charnames ':full'; binmode STDOUT, ':encoding(UTF-8)'; print "\N{INVERTED QUESTION MARK}\n"

Replies are listed 'Best First'.
Re^2: Matching non-ASCII file contents with file name.
by mldvx4 (Hermit) on Dec 23, 2022 at 06:39 UTC

    Thanks, again!

    I see now the mistake but don't understand it. The following is what I had but which was not producing the right result:

    my ($fh, $tempfile) = tempfile(); binmode( $fh, ":utf8" );

    With your corrections, the following produces the right character:

    my ($fh, $tempfile) = tempfile(); binmode( $fh, ":encoding(UTF-8)" );

    What would be the difference between binmode( $fh, ":utf8" ); and binmode( $fh, ":encoding(UTF-8)" ); in regards to the output? I don't understand the difference.

      Maybe the problem is elsewhere? Because binmode says:

      To mark FILEHANDLE as UTF-8, use :utf8 or :encoding(UTF-8). :utf8 just marks the data as UTF-8 without further checking, while :encoding(UTF-8) checks the data for actually being valid UTF-8.

      I read this as that the two should behave identical (except for warnings). Maybe someone else knows where the differences come from.