in reply to Re^2: Trying to understand behavior of split and perl in general with UTF-8
in thread Trying to understand behavior of split and perl in general with UTF-8

It seems I did not truly realize that perl has an internal way of using UTF-8 that has nothing to do with what encoding the shell is using.

Perl's internal way of representing unicode characters is (almost) UTF-8, too.  But for most practical purposes from the user perspective, it helps to ignore this implementation detail1, and just properly decode your inputs and encode your outputs.

___

1 other languages have chosen different internal formats for unicode strings, e.g. Python uses UCS-2 or UCS-4 (build-time option).

  • Comment on Re^3: Trying to understand behavior of split and perl in general with UTF-8

Replies are listed 'Best First'.
Re^4: Trying to understand behavior of split and perl in general with UTF-8
by hdv.jadev (Novice) on Jun 18, 2010 at 11:02 UTC
    I've been rereading the tuts and have experimented a bit with encodings last evening and things are working as I would expect them to (after your help). Hopefully that means I do grasp the basics better now. Thanks for trying to make me a little bit more knowledgeable.