in reply to Re^2: Another utf-8 decoding problem
in thread Another utf-8 decoding problem

When you use Perl for text processing, you always use perl's internal encoding for string representation (which is either iso-8859-1 or UTF-8, depending on the presence of the UTF-8 flag).

So, you decode input data, encode output data. That's always the same workflow, independently of what your output encoding ist.

Perl 6 - links to (nearly) everything that is Perl 6.

Replies are listed 'Best First'.
Re^4: Another utf-8 decoding problem
by DreamT (Pilgrim) on Oct 11, 2010 at 12:27 UTC
    Ok. So I should encode the data from utf-8 and decode it to iso? Why do I then have to set binmode if I have correct encoding of my string? This is greek to me, sorry_:-)
      Ok. So I should encode the data from utf-8 and decode it to iso?

      No. Don't go mixing all the terms I've used.

      You should decode incoming data (from UTF-8 or whatever encoding it is) into perl's internal format.

      Then do your string operations with decoded strings.

      Then when you ouput it, encode it. It's not the right format already - it's in Perl's internal format, which can be either latin1 or UTF-8, depending on some factors you shouldn't care about.

      Please read this, it explains it all in sufficient detail (I hope).

      Perl 6 - links to (nearly) everything that is Perl 6.
        I meant "decode from" and "encode to" - sorry for the mixup

        The thing is that everything else that is printed (without setting the binmode) gets printed ok (does this mean that I already have the correct output mode?). So it feels like I'm trying to decode data that isn't utf-8 from the beginning? (Can I test the incoming data in a simple manner?) I will look into the link you provided.