in reply to Re^6: Another utf-8 decoding problem
in thread Another utf-8 decoding problem

(Can I test the incoming data in a simple manner?)

See the documentation for Encode::decode - you can tell it to die on invalid input.

Perl 6 - links to (nearly) everything that is Perl 6.

Replies are listed 'Best First'.
Re^8: Another utf-8 decoding problem
by DreamT (Pilgrim) on Oct 11, 2010 at 14:10 UTC
    Hmm. I can't seem to get it correct whatever i use. I used Devel::Peek to check the utf-8 flag, and it seems like it's set. Here are some different outputs:


    SV = PV(0xb8de060) at 0xbbabfa0 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0xbb6f730 "traningsredskap"\0 CUR = 15 LEN = 16 SV = PV(0xb8dddd0) at 0xa3b0800 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0xbb8bc98 "traningsredskap"\0 [UTF8 "traningsredskap"] CUR = 15 LEN = 16 SV = PV(0xb8dddd0) at 0xa3b0800 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0xbb6d3d0 "traningsredskap"\0 [UTF8 "traningsredskap"] CUR = 15 LEN = 16 SV = PV(0xb8de060) at 0xbbabfa0 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0xbb72668 "traningsredskap"\0 CUR = 15 LEN = 16
    using

    my $str = $original_value; Dump $str; $str = decode("utf-8", $str); Dump $str; Dump encode('latin1', $str);
      Sorry, meant

      SV = PV(0x9cf0060) at 0x9fbde50 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0x9fa6988 "Tr?ningsredskap"\0 CUR = 15 LEN = 16 SV = PV(0x9cefdd0) at 0x87c2800 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x9f9b490 "Tr\303\244ningsredskap"\0 [UTF8 "Tr\x{e4}ningsredska +p"] CUR = 16 LEN = 20 SV = PV(0x9cefdd0) at 0x87c2800 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x9f7ca38 "Tr\357\277\275ningsredskap"\0 [UTF8 "Tr\x{fffd}nings +redskap"] CUR = 17 LEN = 20 SV = PV(0x9cf0060) at 0x9fbde50 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0x9f6b8c8 "Tr?ningsredskap"\0 CUR = 15 LEN = 16
        SV = PV(0x9cf0060) at 0x9fbde50 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0x9fa6988 "Tr?ningsredskap"\0 CUR = 15 LEN = 16

        This looks like Latin-1

        SV = PV(0x9cefdd0) at 0x87c2800 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x9f9b490 "Tr\303\244ningsredskap"\0 [UTF8 "Tr\x{e4}ningsredska +p"] CUR = 16 LEN = 20

        A proper string in Perl's internal format. Should be fine to print out if you add that IO layer, or put it through Encode::encode.

        SV = PV(0x9cefdd0) at 0x87c2800 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x9f7ca38 "Tr\357\277\275ningsredskap"\0 [UTF8 "Tr\x{fffd}nings +redskap"] CUR = 17 LEN = 20

        This is wrong. It means you decoded something the wrong character encoding.

        Perl 6 - links to (nearly) everything that is Perl 6.