Yllar has asked for the wisdom of the Perl Monks concerning the following question:

use utf8; use Encode qw(encode_utf8); use JSON; use Data::Dumper; my $data = qq( { "cat" : "text – abcd" } ); my $json_data = encode_utf8( $data ); my $perl_hash = decode_json( $json_data ); print Dumper($perl_hash);

I am getting following error when executing the code.

$VAR1 = { 'cat' => "text \x{2013} abcd" };

I need the output like "text – abcd". Is thr any module other than(Text::Unidecode), or a method of converting these characters like ',",-,.,? to simple ASCII characters?

Any help from you guys would be appreciated greatly.

Replies are listed 'Best First'.
Re: Malformed UTF-8 character (unexpected continuation byte 0x96, with no preceding start byte)
by choroba (Cardinal) on Aug 06, 2015 at 16:12 UTC
    $data2 is already decoded as utf-8, no need to decode it. Use
    $decode2 = from_json($data2);
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Malformed UTF-8 character (unexpected continuation byte 0x96, with no preceding start byte)
by Laurent_R (Canon) on Aug 06, 2015 at 20:35 UTC
    The utf8 module enables you to use UTF8 in your source code (e.g. variable name, subroutine names, strings, etc.), it is not aimed at converting incoming data. If you have UTF-8 identifiers, you don't need to convert them, they are already (presumably) in the right format.

    Or did I miss your point?

      Thanks for your input.the issue here is the input is in utf-8 format. In my code we have two statements$data,$data2.If you observer them there is a difference in both the statements.

      In the first statement '-' is printing as it is after decoded.where as in second statement '–' is not printing as it is after decoded.'–' is replaced with another character like ^. I need to print the second statement as it is.

      Thank you.

        It seems to work for me:
        $ perl -E 'use utf8; my $data = "text - abcd"; say "$data";' text - abcd $ $ perl -E 'use utf8; my $data = "text – abcd"; print "$data";' Wide character in print at -e line 1. text – abcd $ $ perl -E 'use utf8; my $data = "text – abcd"; binmode STDOUT, ":utf8 +"; say "$data";' text – abcd
        It might be hard to see the difference on the screen, but I zoomed on the output and I can confirm that I have really printed two different species of dash.
        the code you posted does no printing -- did you remember to binmode?
Re: Malformed UTF-8 character (unexpected continuation byte 0x96, with no preceding start byte)
by ikegami (Patriarch) on Aug 07, 2015 at 18:43 UTC
    The others are right to point out should be using from_json, but that won't solve the error you're asking about.

    The error you are asking about comes from Perl, not JSON. By using use utf8;, you claimed your source code is encoded using UTF-8, but it's not. Remove the use utf8; or convert the source code to UTF-8.

      Thanks for your Inputs. I tried all the possible solutions.Unfortunately, I have not got the permanent fix for this issue. The issue is not stick to hyphen(-), my source code throwing error with all non ASCII characters when decoding input data which contains ASCII characters.

        Show the output of cat script.pl ; od -c script.pl, and explain what you mean by "doesn't work".