in reply to Re^4: Another utf-8 decoding problem
in thread Another utf-8 decoding problem

Ok. So I should encode the data from utf-8 and decode it to iso?

No. Don't go mixing all the terms I've used.

You should decode incoming data (from UTF-8 or whatever encoding it is) into perl's internal format.

Then do your string operations with decoded strings.

Then when you ouput it, encode it. It's not the right format already - it's in Perl's internal format, which can be either latin1 or UTF-8, depending on some factors you shouldn't care about.

Please read this, it explains it all in sufficient detail (I hope).

Perl 6 - links to (nearly) everything that is Perl 6.

Replies are listed 'Best First'.
Re^6: Another utf-8 decoding problem
by DreamT (Pilgrim) on Oct 11, 2010 at 13:06 UTC
    I meant "decode from" and "encode to" - sorry for the mixup

    The thing is that everything else that is printed (without setting the binmode) gets printed ok (does this mean that I already have the correct output mode?). So it feels like I'm trying to decode data that isn't utf-8 from the beginning? (Can I test the incoming data in a simple manner?) I will look into the link you provided.
      (Can I test the incoming data in a simple manner?)

      See the documentation for Encode::decode - you can tell it to die on invalid input.

      Perl 6 - links to (nearly) everything that is Perl 6.
        Hmm. I can't seem to get it correct whatever i use. I used Devel::Peek to check the utf-8 flag, and it seems like it's set. Here are some different outputs:


        SV = PV(0xb8de060) at 0xbbabfa0 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0xbb6f730 "traningsredskap"\0 CUR = 15 LEN = 16 SV = PV(0xb8dddd0) at 0xa3b0800 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0xbb8bc98 "traningsredskap"\0 [UTF8 "traningsredskap"] CUR = 15 LEN = 16 SV = PV(0xb8dddd0) at 0xa3b0800 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0xbb6d3d0 "traningsredskap"\0 [UTF8 "traningsredskap"] CUR = 15 LEN = 16 SV = PV(0xb8de060) at 0xbbabfa0 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0xbb72668 "traningsredskap"\0 CUR = 15 LEN = 16
        using

        my $str = $original_value; Dump $str; $str = decode("utf-8", $str); Dump $str; Dump encode('latin1', $str);