in reply to Re: Trying to understand behavior of split and perl in general with UTF-8
in thread Trying to understand behavior of split and perl in general with UTF-8

Ah, I think I know where I went of the cliff now.

"use utf8" is only for the source itself, not for any character streams coming in or going out. For that I have to explicitly state the encoding. It seems I did not truly realize that perl has an internal way of using UTF-8 that has nothing to do with what encoding the shell is using. Am I correct in this assumption?

With what you and almut told me I got the example code working in a few seconds, so that gives me some hope I do understand at least somewhat better now.

Thanks very much for helping me renew my mastery of perl!

  • Comment on Re^2: Trying to understand behavior of split and perl in general with UTF-8

Replies are listed 'Best First'.
Re^3: Trying to understand behavior of split and perl in general with UTF-8
by almut (Canon) on Jun 17, 2010 at 20:42 UTC
    It seems I did not truly realize that perl has an internal way of using UTF-8 that has nothing to do with what encoding the shell is using.

    Perl's internal way of representing unicode characters is (almost) UTF-8, too.  But for most practical purposes from the user perspective, it helps to ignore this implementation detail1, and just properly decode your inputs and encode your outputs.

    ___

    1 other languages have chosen different internal formats for unicode strings, e.g. Python uses UCS-2 or UCS-4 (build-time option).

      I've been rereading the tuts and have experimented a bit with encodings last evening and things are working as I would expect them to (after your help). Hopefully that means I do grasp the basics better now. Thanks for trying to make me a little bit more knowledgeable.
Re^3: Trying to understand behavior of split and perl in general with UTF-8
by hdv.jadev (Novice) on Jul 22, 2010 at 12:16 UTC
    Sorry it took so long to get back at this. Holidays... Anyway, just to let you know. Yesterday the person I wrote the script for successfully converted about 2 GB worth of plain-text files with old research data to a new database format. Thanks to your help it all went smoothly.