http://qs1969.pair.com?node_id=1221554

slugger415 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a simple JSON query (to Facebook events) that contains smart quotes and other special characters that render badly in the browser. I've tried utf8::encode on the relevant field but that seems to wipe it out completely (it returns nothing).

code snippet:

use JSON qw( decode_json ); use LWP::Simple; use utf8; use strict; my($url) = "https://graph.facebook.com/$id?access_token=" . $token; my($json) = get($url); my($decoded) = decode_json($json); $event{'desc'} = utf8::decode($decoded->{'description'});

Without the utf8::decode I get all the description but with bad character rendering, e.g. "Action Films" (with smart quotes) looks like:

“Action filmsâ€

But again, the utf8::decode seems to wipe it out.

thanks, Scott

Replies are listed 'Best First'.
Re: special characters in parsed json rendering badly in browser
by tinita (Parson) on Sep 02, 2018 at 09:21 UTC

      Result of Dump:

      SV = PV(0xcdc0b8) at 0x2522f38 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x3f44048 "Saturday, September 22, 2018\n7:30pm Doors / 8pm Per +formances\n $15 Guests / $10 for members: .....etc.

      and at the end:

      CUR = 3552 LEN = 3554

      thanks

Re: special characters in parsed json rendering badly in browser
by Corion (Patriarch) on Sep 03, 2018 at 08:02 UTC

    According to JSON, JSON::decode_json expects raw octets. Also, it returns already UTF-8 decoded content, so your additional decode step should not be necessary.

    Have you looked at the octets you download and have you verified that your problem is not in the further treatment or output of the data?

    Personally, I find it helpful to look at hexdumps of the octets to verify that the proper data is written to the console.

    I see that nowhere in the example code you binmode STDOUT, <c>':encoding(UTF-8)', maybe that would be a good step?

Re: special characters in parsed json rendering badly in browser
by Anonymous Monk on Sep 02, 2018 at 03:18 UTC
    Try decoding it with Encode:
    use Encode; $event{'desc'} = Encode::decode(utf8 => $decoded->{'description'});
    You mention a browser so be sure to specify the charset:
    Content-type: text/html; charset=utf-8

      result of Encode:

      Wide character at C:/Strawberry/perl/lib/Encode.pm line 228.

      And the script stops there.

      charset in Firefox Windows iso-8859-1

      not sure where that gets specified

        Is the browser showing the contents of a file or the response from a web server ?

        poj
Re: special characters in parsed json rendering badly in browser
by slugger415 (Monk) on Sep 02, 2018 at 19:52 UTC

    One thing I might have mentioned is the json returns a kind of encoding I'm not familiar with, for example these appear to be smart quotes:

    what \x{2018}film about film\x{2019} or \x{2018}film as film\x{2019} might mean.

    I'm guessing the JSON module is decoding those somehow in a way I don't want.

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: special characters in parsed json rendering badly in browser
by slugger415 (Monk) on Sep 09, 2018 at 13:55 UTC

    Thank you all for your suggestions but I don't feel I'm any closer with this problem, partly my own fault for not being clear what I want. (Which i'm figuring out as I puzzle it out.) I'd like the text/data to be portable to other applications that can read HTML.

    I did figure out (thanks poj) that I can set the encoding in the HTML page to have it display correctly in the browser. But if that text gets posted to another page or app I don't necessarily have control over its page encoding.

    SO what I really want is to HTML encode those characters. But when I try HTML::Encode: for, say, the smart apostrophe:

    use HTML::Entities; my($text) = "Kren’s 89th birthday"; print encode_entities($text), $/;

    that one character gets converted to three HTML entities:

    Kren&acirc;&#128;&#153;s 89th birthday

    which displays a lot of garbage in the browser.

    So I'm lost as to what's going on or how to resolve it. Perl is rendering the JSON string as a smart quote but HTML::Encode is improperly encoding it.

    So far my best solution seems to be:

    $event{'desc'} =~ s/’/\&\#39\;/g; $event{'desc'} =~ s/–/-/g; $event{'desc'} =~ s/—/ - /g; $event{'desc'} =~ s/‘/'/g; $event{'desc'} =~ s/'/'/g; $event{'desc'} =~ s/“/"/g; $event{'desc'} =~ s/”/"/g;

    but of course that only handles characters I'm aware of.

    Thoughts? Thanks for your patience.

    Scott

      see utf8 - The use utf8 pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope.

      use utf8; use HTML::Entities; my($text) = "Kren’s 89th birthday"; print encode_entities($text), $/; # result right single quote # Kren&rsquo;s 89th birthday
      poj

        That's it! That's all I needed, thank you!