As a linguist, I've worked with various languages including Arabic, Chinese, or Tamil. We processed corpora in those languages in Perl, we even built an treebank annotation tool in Tk. We never had problems with Unicode. 🤷🏽
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]