Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks

I admit I am quite bad at parsing data. I want to read some information out of json data. Someway I am not getting the information I want. The data is the following.

{ "head": {}, "def": [ { "text": "time", "pos": "noun", "ts": "taɪm", "tr": [ { "text": "tempo", "pos": "noun", "gen": "m", "syn": [ { "text": "volta", "pos": "noun", "gen": "f" }, { "text": "momento", "pos": "noun", "gen": "m" }, { "text": "Time", "pos": "noun", "gen": "m" } ], "mean": [ { "text": "day" }, { "text": "moment" } ] } ] } ] }

First of all, I would like to read "text": "tempo" (i.e. tempo). Furthermore I would like to read also a the other values of "text" inside "syn", being the number of "text" element variable

I am using the following:

my $req = HTTP::Request->new(GET => $uri); my $res = $ua->request($req); if ($res->is_success) { my $json_text= $res->content; my $decoded_json = decode_json( $json_text ); my $result = $decoded_json->{'def'}{'tr'}{'text'};#Here I was suppos +ing to get the first "text" value = tempo }

Unfortunately it fails, as I am evidently interpreting badly the data structure (and a dumper did not help me a lot). Any suggestions?

Replies are listed 'Best First'.
Re: json decoding
by roboticus (Chancellor) on Oct 12, 2017 at 22:44 UTC

    Your data structure shows that the def member is an array, as is tr. So you want something more like:

    my $result = $decoded_json->{def}[0]{tr}[0]{text};

    Just remember that the square brackets show that you've got an array, and the curly braces are for a hash.

    To access the syn elements, you should be able to do:

    my $syn_array = $decoded_json->{def}[0]{tr}[0]{syn}; for my $rSyn (@$syn_array) { print "$rSyn->{text}, $rSyn->{pos}, $rSyn->{gen}\n"; }

    which should give you something like:

    volta, noun, f momento, noun, m Time, noun, m

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: json decoding
by kcott (Archbishop) on Oct 13, 2017 at 06:41 UTC
    "I admit I am quite bad at parsing data."

    Probably the first thing to do would be to understand complex data structures. Take a look at "perldsc - Perl Data Structures Cookbook". This explains arrays of arrays and hashes; hashes of arrays and hashes; and finally builds up to complex data structures such as you're dealing with here.

    "Unfortunately it fails, as I am evidently interpreting badly the data structure (and a dumper did not help me a lot). Any suggestions?"

    Telling us "it fails", without any additional information, is pretty much useless. Did you use the strict and warnings pragmata? If not, you should do so: let Perl tell you about possible problems — it's a lot quicker than posting a question. Did you test "$result"? If so, you probably would have got a message about an uninitialised value (assuming you'd used warnings). However you tested "$result", and whatever messages you received, should be reported.

    Telling us "a dumper did not help me a lot" is, again, not much use if we are to help you. Which "dumper" did you use? In what way was its output unhelpful?

    I copied the JSON code you posted to 'pm_1201279_input.json'. My usual preference is the CPAN module Data::Dump, mainly for the compact output.

    $ perl -e 'use JSON; use Data::Dump; my $x = do { local $/; <> }; dd(d +ecode_json $x)' pm_1201279_input.json { def => [ { pos => "noun", text => "time", tr => [ { gen => "m", mean => [{ text => "day" }, { text => "momen +t" }], pos => "noun", syn => [ { gen => "f", pos => "noun", text +=> "volta" }, { gen => "m", pos => "noun", text +=> "momento" }, { gen => "m", pos => "noun", text +=> "Time" }, ], text => "tempo", }, ], ts => "ta&#618;m", }, ], head => {}, }

    The "def => [" makes it pretty clear that the value of the key "def" is an arrayref. This should tell you that "$decoded_json->{'def'}{'tr'}" is clearly a problem: you need an array index where you have the hash key "{'tr'}".

    Deeper into the structure, the Data::Dump output may be a bit too compact for you (at least until you're somewhat more comfortable with this degree of complexity). The core module Data::Dumper might be a better choice here.

    $ perl -e 'use JSON; use Data::Dumper; my $x = do { local $/; <> }; pr +int Dumper decode_json($x)' pm_1201279_input.json $VAR1 = { 'def' => [ { 'text' => 'time', 'tr' => [ { 'syn' => [ { 'pos' => 'noun', 'text' => 'volta', 'gen' => 'f' }, { 'pos' => 'noun', 'text' => 'momento', 'gen' => 'm' }, { 'pos' => 'noun', 'text' => 'Time', 'gen' => 'm' } ], 'text' => 'tempo', 'gen' => 'm', 'pos' => 'noun', 'mean' => [ { 'text' => 'day' }, { 'text' => 'moment' } ] } ], 'pos' => 'noun', 'ts' => 'ta&#618;m' } ], 'head' => {} };

    Take a look at the documentation for both of those modules. There are various ways to use them such that the output is more to your liking.

    Even without using a dumper, you can troubleshoot problems of this nature by walking the list of keys and indices. Start at the beginning and see what each refers to. In this case, look at the "def" key first:

    $ perl -E 'use JSON; my $j = do { local $/; <> }; my $p = decode_json +$j; say $p->{def}' pm_1201279_input.json ARRAY(0x7f8e1c881968)

    As before, that's indicating that you need an array index where you have the hash key "{'tr'}". So, look at the first index:

    $ perl -E 'use JSON; my $j = do { local $/; <> }; my $p = decode_json +$j; say $p->{def}[0]' pm_1201279_input.json HASH(0x7f89ee002e30)

    Now you know you'll need a hash key, then an array index, then another hash key. What keys are there?

    $ perl -E 'use JSON; my $j = do { local $/; <> }; my $p = decode_json +$j; say for keys $p->{def}[0]->%*' pm_1201279_input.json ts pos tr text

    [See "perlref: Postfix Dereference Syntax" if you're unfamiliar with the '$p->{def}[0]->%*' syntax. If you're writing for older versions of Perl, you can use '%{ $p->{def}[0] }' instead.]

    Now you've found your "tr" key. So "$decoded_json->{'def'}{'tr'}" needs to be "$decoded_json->{def}[0]{tr}" (when keys are just alphabetic strings, they're automatically quoted: saves typing and code clutter).

    Keeping working through the structure: eventually you'll get to the "$decoded_json->{def}[0]{tr}[0]{text}" that ++roboticus showed; and subsequently to the remainder of your requirements.

    Complex data structures can appear daunting when first encountered; however, they're fairly easy to master. Work through other problems using the techniques I've shown and you'll soon get the hang of it. Ultimately, you'll be able to just look at JSON data, such as you've shown, and know intuitively what Perl code you'll need to access whatever parts you're interested in.

    — Ken

      Thank you so much!