in reply to Re: Another utf-8 decoding problem
in thread Another utf-8 decoding problem

Thanks,

Regarding the environment, I'm unfortunately forced to use iso-encoding, since the target environment uses it. So it feels like i need to present the data in this encoding regardless of the input format?

Replies are listed 'Best First'.
Re^3: Another utf-8 decoding problem
by moritz (Cardinal) on Oct 11, 2010 at 11:54 UTC
    When you use Perl for text processing, you always use perl's internal encoding for string representation (which is either iso-8859-1 or UTF-8, depending on the presence of the UTF-8 flag).

    So, you decode input data, encode output data. That's always the same workflow, independently of what your output encoding ist.

    Perl 6 - links to (nearly) everything that is Perl 6.
      Ok. So I should encode the data from utf-8 and decode it to iso? Why do I then have to set binmode if I have correct encoding of my string? This is greek to me, sorry_:-)
        Ok. So I should encode the data from utf-8 and decode it to iso?

        No. Don't go mixing all the terms I've used.

        You should decode incoming data (from UTF-8 or whatever encoding it is) into perl's internal format.

        Then do your string operations with decoded strings.

        Then when you ouput it, encode it. It's not the right format already - it's in Perl's internal format, which can be either latin1 or UTF-8, depending on some factors you shouldn't care about.

        Please read this, it explains it all in sufficient detail (I hope).

        Perl 6 - links to (nearly) everything that is Perl 6.