I thought I can print unicode to STDOUT. I will read more on that.
See perlunitut. Filehandles work with bytes, not characters.
If you pay closer attention you will see that I am using the validating capability of is_utf8 ($string, 'true_value').
It checks the INTERNAL BYTE BUFFER of the unicode string. It is an internal consistency check, and should only be used to verify Perl's internal functioning, not your own strings. Apparently the [INTERNAL] in the documentation is not clear enough, given the huge number of people who don't realise that it is an internal function. I'll see if I can get that changed.
| [reply] [d/l] |
Well today was definitely a fruitful day. I learned about C0/C1 control codes which were the reason for google to complain. I also realized where this stuff actually comes from (someone pasting a mis-encoded chunk of text into a browser window). Finally I know not to use is_utf8 anymore :)
Thank you for your comments.
P.S.How I ended up fixing this:
$_ =~ s/[\x{80}-\x{9F}]/\x{FFFD}/g;
| [reply] [d/l] |