Re: Triple Byte UTF8 Characters

Here's a working example under linux:

# test.pl
use strict;
use warnings;
binmode $_, ':encoding(UTF-8)' for *STDIN, *STDOUT, *STDERR;
my $line = <>;
print $line;
chomp $line;
print length($line), "\n";
__END__

# echo -e "\xe4\xbb\x96"|perl test.pl
[download]

Output:

他
1

The hard thing is that you have to manually keep track of which strings are decoded, and which not. Which is a huge cognitive load, unless you settle for decoding everything through an input IO layer, and encoding everything through an output IO layer.

Most database drivers have an option to do that for you too.