Re: Matching non-ASCII file contents with file name.

By having use utf8; in your code, you only tell Perl that your source code is in UTF-8 (so the inverted question mark gets recognized as that), not what the input and output should be encoded in. Perl knows that your output handle is (say) Latin-1 and as it can convert the Unicode string it read from the UTF-8 to Latin-1 it does so when printing.

I find the approach of explicitly specifying the encodings for filenames the easiest way to get consistent results:

#!perl
use strict;
use warnings;

use charnames ':full';

binmode STDOUT, ':encoding(UTF-8)';
print "\N{INVERTED QUESTION MARK}\n"
[download]

Comment on Re: Matching non-ASCII file contents with file name. Select or Download Code

Replies are listed 'Best First'.
Re^2: Matching non-ASCII file contents with file name. by mldvx4 (Hermit) on Dec 23, 2022 at 06:39 UTC
Thanks, again! I see now the mistake but don't understand it. The following is what I had but which was not producing the right result: `my ($fh, $tempfile) = tempfile(); binmode( $fh, ":utf8" );` [download] With your corrections, the following produces the right character: `my ($fh, $tempfile) = tempfile(); binmode( $fh, ":encoding(UTF-8)" );` [download] What would be the difference between `binmode( $fh, ":utf8" );` and `binmode( $fh, ":encoding(UTF-8)" );` in regards to the output? I don't understand the difference.	[reply] [d/l] [select]
Re^3: Matching non-ASCII file contents with file name. by Corion (Patriarch) on Dec 23, 2022 at 07:16 UTC
Maybe the problem is elsewhere? Because binmode says: To mark FILEHANDLE as UTF-8, use `:utf8` or `:encoding(UTF-8)`. `:utf8` just marks the data as UTF-8 without further checking, while `:encoding(UTF-8)` checks the data for actually being valid UTF-8. I read this as that the two should behave identical (except for warnings). Maybe someone else knows where the differences come from.	[reply]
Re^4: Matching non-ASCII file contents with file name. by hippo (Archbishop) on Dec 23, 2022 at 08:26 UTC
For reference, the full gory details of the differences are very well explained by brian_d_foy in Know the difference between utf8 and UTF-8. 🦛	[reply]