in reply to Re^2: Trying to understand behavior of split and perl in general with UTF-8
in thread Trying to understand behavior of split and perl in general with UTF-8
It seems I did not truly realize that perl has an internal way of using UTF-8 that has nothing to do with what encoding the shell is using.
Perl's internal way of representing unicode characters is (almost) UTF-8, too. But for most practical purposes from the user perspective, it helps to ignore this implementation detail1, and just properly decode your inputs and encode your outputs.
___
1 other languages have chosen different internal formats for unicode strings, e.g. Python uses UCS-2 or UCS-4 (build-time option).
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Trying to understand behavior of split and perl in general with UTF-8
by hdv.jadev (Novice) on Jun 18, 2010 at 11:02 UTC |