in reply to Re^5: Another utf-8 decoding problem
in thread Another utf-8 decoding problem

I meant "decode from" and "encode to" - sorry for the mixup

The thing is that everything else that is printed (without setting the binmode) gets printed ok (does this mean that I already have the correct output mode?). So it feels like I'm trying to decode data that isn't utf-8 from the beginning? (Can I test the incoming data in a simple manner?) I will look into the link you provided.

Replies are listed 'Best First'.
Re^7: Another utf-8 decoding problem
by moritz (Cardinal) on Oct 11, 2010 at 13:17 UTC
    (Can I test the incoming data in a simple manner?)

    See the documentation for Encode::decode - you can tell it to die on invalid input.

    Perl 6 - links to (nearly) everything that is Perl 6.
      Hmm. I can't seem to get it correct whatever i use. I used Devel::Peek to check the utf-8 flag, and it seems like it's set. Here are some different outputs:


      SV = PV(0xb8de060) at 0xbbabfa0 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0xbb6f730 "traningsredskap"\0 CUR = 15 LEN = 16 SV = PV(0xb8dddd0) at 0xa3b0800 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0xbb8bc98 "traningsredskap"\0 [UTF8 "traningsredskap"] CUR = 15 LEN = 16 SV = PV(0xb8dddd0) at 0xa3b0800 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0xbb6d3d0 "traningsredskap"\0 [UTF8 "traningsredskap"] CUR = 15 LEN = 16 SV = PV(0xb8de060) at 0xbbabfa0 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0xbb72668 "traningsredskap"\0 CUR = 15 LEN = 16
      using

      my $str = $original_value; Dump $str; $str = decode("utf-8", $str); Dump $str; Dump encode('latin1', $str);
        Sorry, meant

        SV = PV(0x9cf0060) at 0x9fbde50 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0x9fa6988 "Tr?ningsredskap"\0 CUR = 15 LEN = 16 SV = PV(0x9cefdd0) at 0x87c2800 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x9f9b490 "Tr\303\244ningsredskap"\0 [UTF8 "Tr\x{e4}ningsredska +p"] CUR = 16 LEN = 20 SV = PV(0x9cefdd0) at 0x87c2800 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x9f7ca38 "Tr\357\277\275ningsredskap"\0 [UTF8 "Tr\x{fffd}nings +redskap"] CUR = 17 LEN = 20 SV = PV(0x9cf0060) at 0x9fbde50 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0x9f6b8c8 "Tr?ningsredskap"\0 CUR = 15 LEN = 16