in reply to Perl not recognizing Chinese
Unicode uses two bytes per character
For characters like ř, it's true, but for Chinese, it's not. UTF-8 is a "variable-length" encoding.
#!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use open ':encoding(UTF-8)', ':std'; use Encode; chomp( my $chinese = <> ); say length $chinese; my $octets = encode('UTF-8' => $chinese); say length $octets;
Where the input contains (UTF-8 encoded):
焚书坑儒
Output:
4 12
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
|
|---|