#!/usr/bin/perl -w
use utf8;
# without the encoding layer
open (my $fh1, ">", "test-normal");
# with the encoding layer
open (my $fh2, ">:utf8", "test-utf8");
# ASCII data
# encodes the same in UTF-8 or Latin-1 encodings
my $ascdata = "aei\n";
print $fh1 $ascdata;
print $fh2 $ascdata;
# accented a e i
my $l1data = "\xe1\xe9\xed\n";
# these characters *can* be encoded in Latin-1 or in UTF-8
# (though differently for each)
print $fh1 $l1data;
print $fh2 $l1data;
# U+0641 ARABIC LETTER FEH
my $u8data = "\x{0641}\n";
# "Arabic-Feh" can't be encoded in Latin-1, can be encoded in UTF-8
print $fh1 $u8data; # <--THIS LINE GENERATES WARNING
print $fh2 $u8data;
Here's the results, when checked with od:
[jeremy@serpent pm-test]$ perl wide-char.pl
Wide character in print at wide-char.pl line 27.
[jeremy@serpent pm-test]$ od -t x1 test-normal
0000000 61 65 69 0a e1 e9 ed 0a d9 81 0a
0000013
[jeremy@serpent pm-test]$ od -t x1 test-utf8
0000000 61 65 69 0a c3 a1 c3 a9 c3 ad 0a d9 81 0a
0000016
[jeremy@serpent pm-test]$
I recognize this isn't exactly what you were asking, but I suspect that the utf8 pragma and the :utf8 encoding layer are getting mixed up somewhere in your code -- one or another is missing, etc.
More specifically, it sounds like you're trying to print a character with a chr value larger than 0xff on a Latin-1 filehandle. Those characters, aren't encodable like that, so you're running the risk of losing data. This is a problem and I encourage you to track it down. Turning off a warning isn't the same as fixing the cause of one.
It might help if you would post a snippet that exhibits the warning. Warnings are usually there for a reason, and perhaps there's something in your code that is a bit sketchy from the compiler's point-of-view.
Hope that helps.
Update: I've just noticed that Perl's coping behavior for characters greater than 0xff on a non-utf8 output filehandle is to print the utf-8 encoding of that character anyway: note that the last two bytes before the newline in both examples are d9 81.
No wonder you get a warning. There's no systematic way to recover whether the output data was originally UTF-8 or not!
Update 2: Cleaned up comments by using the word "encoded" instead of "printed". |