vlad.goshko has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I have question about json decoding. I need to deserialize invalid json, but I dont know how. Here is what I got, sample from page:

... is_touch: ('ontouchstart' in window), repository_urls: ["publisher_view2_videossp/PublisherView_videossp2"], origin_repository_url: "publisher_view2_videossp/PublisherView_videoss +p2", workbook_repo_url: "publisher_view2_videossp", publish_date: new Date(1431654084199), tabs_allowed: false, showTabsWorkbook: true, mobile_app_cookie: "", current_project_id: "7", current_sheet_name: "", current_sheet_type: "", current_workbook_name: "publisher_view2_videossp", current_workbook_id: "1548", current_view_id: "2713", sheetId: "PublisherView_videossp2", showParams: "{"revertType":null,"refresh":false,"checkpoint":false,"sh +eetName":"","unknownParams":"zone_id=6290&ac=a9aef098062f9646bd1638d0 +c6054c7b&account_id=62&Start_Date=2016-07-27&End_Date=2016-08-03","la +youtId":""}", stickySessionKey: "{"isAuthoring":false,"lastUpdatedAt":1470316180805, +"workbookId":1548}", ...

I'm using JSON::PP module to decode with maximum loose decoding, like this:

$res = JSON::PP->new->relaxed->allow_singlequote->allow_barekey->loose +->decode($data);

But I have troubles with fields like this:

is_touch: ('ontouchstart' in window),

And this:

publish_date: new Date(1431654084199),

And this:

showParams: "{"revertType":null,"refresh":false,"checkpoint":false,"sh +eetName":"","unknownParams":"zone_id=6290&ac=a9aef098062f9646bd1638d0 +c6054c7b&account_id=62&Start_Date=2016-07-27&End_Date=2016-08-03","la +youtId":""}",

It throws error: malformed JSON string, and its understood.

So question is: what can I do to filter lines like this? Json content that I receive is dynamic, so hardcoding this lines wont affect.

Replies are listed 'Best First'.
Re: JSON decode problem
by Corion (Patriarch) on Aug 04, 2016 at 14:51 UTC

    This is not "invalid JSON", but simply Javascript code. Either you write a proper parser which parses/preprocesses/handles those expressions for you, or you run that code through a Javascript interpreter and then dump the result as JSON.

Re: JSON decode problem
by kennethk (Abbot) on Aug 04, 2016 at 15:59 UTC
    Your source is not emitting valid JSON. Why can't you fix the source? For your showParams example, the problem is that there are unescaped control characters -- this isn't even valid JavaScript. There is literally no general solution for your problem.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: JSON decode problem
by FreeBeerReekingMonk (Deacon) on Aug 05, 2016 at 13:20 UTC
    As fellow monks have said, the problem here is that it is not portable json, it needs to be evalled into a javascript environment, like your browser. So if you are in perl, what can you do? Preparse it! Lines like 'ontouchstart' in window replace them with 0 (false). And whenever you see ": new Date(", just divide the number by 1000 to get a valid epoch (the extra 3 digits are microseconds, do you need them?)

    #!/usr/bin/perl $str=' publish_date: new Date(1431654084199) ' ; $str =~ s/:\s*new Date\((\d+)\)\s*/&epoch2timestring($1)/ge; print $str . "\n"; sub epoch2timestring{ my ($e) = @_; $e=~s/.{3}$//; # eat 3 last digits my $result = scalar localtime($e); return ': "' . $result . '" '; # gmtime() ? }

    prints:

     publish_date: "Fri May 15 03:41:24 2015"

      Please read my node. showParams cannot be evalled because of syntax errors. If a data stream is emitted with syntax errors of this magnitude, the content had better be valuable the fixes you make are going to be incredibly fragile.

      For example, in order to clean showParams, you need to detect unescaped quotes and escape them. For the sample you could run

      my $str = 'showParams: "{"revertType":null,"refresh":false,"checkpoint +":false,"sheetName":"","unknownParams":"zone_id=6290&ac=a9aef098062f9 +646bd1638d0c6054c7b&account_id=62&Start_Date=2016-07-27&End_Date=2016 +-08-03","layoutId":""}"'; while ($str =~ /(?<!\\)"\s*\{\s*("(?:(?!\}").)*)\}"/) { substr($str, $-[1], $+[1] - $-[1]) =~ s/(?=["\\])/\\/g; }
      but good luck maintaining that.

      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        Well, that is the thing everyone is pointing out. You can either write a pre-parser that fixes it into standard JSON, then read it with the JSON modules, or parsing it yourself (which could mean you will have to implement much more).

        From JSON#Data_portability_issues

        Despite the widespread belief that JSON is a strict subset of JavaScript, this is not the case. Specifically, JSON allows the Unicode line terminators U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR to appear unescaped in quoted strings, while JavaScript does not. .

        In "Unsupported native data types" Wikipedia specifically states for Date() that there are some de facto standards, e.g., converting from Date to String, but none universally recognized. This JSON is "weird" like that.

        So each line is an identifier: thing where thing can be:

        (something that must be evaluated in Javascript itself)

        "{"...."}" a hash

        "a string"

        false a boolean

        new Date(... a date

        Strings and booleans are json already, the others, not that much...

Re: JSON decode problem
by FreeBeerReekingMonk (Deacon) on Aug 11, 2016 at 21:38 UTC
    Vlad, this parses your string into a Perl hash, after doing some tweaking to your JSON string.

    #!/usr/bin/perl use strict; use warnings; use JSON::PP; my $json_text = ''; # read string from __DATA__ while(<DATA>){ $json_text .=$_ } # work around javascript eval's $json_text =~ s/^(\w+:) \(.*$/"$1": 0, /gm; # fix date $json_text =~ s/:\s*new Date\((\d+)\)\s*/&epoch2timestring($1)/ge; # fix hashes $json_text =~ s/^(\w+:) "{(.*)}",/"$1": {$2}, /gm; # fix remaining fields (should be quotes) $json_text =~ s/^(\w+):/"$1":/gm; my $perl_scalar = JSON::PP->new->relaxed->allow_singlequote->allow_bar +ekey->loose->decode($json_text); use Data::Dumper; die Dumper($perl_scalar); sub epoch2timestring{ my ($e) = @_; $e=~s/.{3}$//; # eat 3 last digits my $result = scalar localtime($e); return ': "' . $result . '" '; # gmtime() ? } __DATA__ { is_touch: ('ontouchstart' in window), repository_urls: ["publisher_view2_videossp/PublisherView_videossp2"], origin_repository_url: "publisher_view2_videossp/PublisherView_videoss +p2", workbook_repo_url: "publisher_view2_videossp", publish_date: new Date(1431654084199), tabs_allowed: false, showTabsWorkbook: true, mobile_app_cookie: "", current_project_id: "7", current_sheet_name: "", current_sheet_type: "", current_workbook_name: "publisher_view2_videossp", current_workbook_id: "1548", current_view_id: "2713", sheetId: "PublisherView_videossp2", showParams: "{"revertType":null,"refresh":false,"checkpoint":false,"sh +eetName":"","unknownParams":"zone_id=6290&ac=a9aef098062f9646bd1638d0 +c6054c7b&account_id=62&Start_Date=2016-07-27&End_Date=2016-08-03","la +youtId":""}", stickySessionKey: "{"isAuthoring":false,"lastUpdatedAt":1470316180805, +"workbookId":1548}", }

    edit: added missing "my" that got lost in copy/paste, now it runs...