leszekdubiel has asked for the wisdom of the Perl Monks concerning the following question:

I need to clean json data from nulls, empty arrays, empty hashes. Current solution:

#!/usr/bin/perl -CSDA # Removes empty arrays, empty objects, and nulls from json structure. use utf8; use Modern::Perl qw{2017}; use JSON; use Data::Dumper; sub tidyup_json { my ($r) = @_; if (ref $r eq "HASH") { for my $k (keys %$r) { tidyup_json($$r{$k}); delete $$r{$k} if (ref $$r{$k} eq "HASH" && ! %{$$r{$k}}) or (ref $$r{$k} eq "ARRAY" && ! @{$$r{$k}}) or ! defined $$r{$k}; } } elsif (ref $r eq "ARRAY") { tidyup_json($$r[$_]) for 0 .. @$r - 1; @$r = grep { not ( (ref $_ eq "HASH" && ! %$_) or (ref $_ eq "ARRAY" && ! @$_) or ! defined $_ ) } @$r; } # use "$_[0]", because "$_[0]" is an alias to real reference that +was passed to the function $_[0] = undef if (ref $r eq "HASH" && ! %$r) or (ref $r eq "ARRAY" && ! @$r); } my $json = JSON->new->allow_nonref->relaxed->decode(do { local $/; <ST +DIN>; }); tidyup_json($json); print JSON->new->pretty->allow_nonref->canonical->encode($json);

What's the problem? See the "$_[0]" at the end of function? This is an alias for passed reference of json data. If that json data is empty hash, then the data must become "undef". So I have to modify data outside of function -- using alias. Is there more elegant way to handle it?

Here are results:
# echo '{"empty inner hash": {}, "empty inner array": [], "some undef" +: null }' | ./tidyup_json null # echo '{"empty inner hash": {}, "empty inner array": [1], "some undef +": null }' | ./tidyup_json { "empty inner array" : [ 1 ] } # echo '{"empty inner hash": {}, "empty inner array": [1], "some undef +": 7 }' | ./tidyup_json { "empty inner array" : [ 1 ], "some undef" : 7 } # echo '{ }' | ./tidyup_json ## !!! important to return null null

PS. Beware that "true" and "false" jsonish data are references to blessed hashes.

Replies are listed 'Best First'.
Re: Tidy up json from nulls, empty arrays and hashes
by perlfan (Parson) on May 28, 2020 at 13:18 UTC
    >Is there more elegant way to handle it?

    The way you have it right now, as a recursive depth first (looks like) traversal of an abritrarily complex data structure, I am going to say no.

    $_[0] = undef if (ref $r eq "HASH" && ! %$r) or (ref $r eq "ARRAY" && ! @$r);
    The above code is simply the base case of your recursion, what allows it to unwind the call stack.

    What is tripping me up is that I have no immediate idea what $$r{$k} is doing - is that some sort of generic reference? Double dollar signs are generally a red flag for me, so I can't really recognize when they might be doing something useful.

    Do you have a snippet of JSON to share? I am really curious about your method of traversing the hash. I feel like there is a module on CPAN that allows more idiomatic interactions with your datastructure you call, $json.

    Note: I always think myself later when I don't mix up what is $json (a string) and what is the deserialized reference:

    my $reference_from_json = JSON->new->allow_nonref->relaxed->decode(do +{ local $/; <STDIN>; }); tidyup_json($refrence_from_json); print JSON->new->pretty->allow_nonref->canonical->encode($reference_fr +om_json);
    >Beware that "true" and "false" jsonish data are references to blessed hashes.

    Which one, JSON::PP::Boolean? In JSON::PP::Boolean::value_to_json, I see:

    elsif( blessed($value) and $value->isa('JSON::PP::Boolean') ) +{ return $$value == 1 ? 'true' : 'false'; }
    Indicates to me some implicit recognition of a TO_JSON method that the family of JSON modules support for serializing complex data structures containing blessed references into stringified JSON form.

      Dittos on the $$r{k} looking wrong; reads more clearly to me written as $r->{k} (or if you insist on not using -> then make the dereferencing visually explicit with ${ $r }{ k }).

      Edit: Appeal to authority, but quoth PBP chapter 11 "Wherever possible, dereference with arrows."; I just skimmed the modern PBP videos and didn't find anything explicitly contradicting this so . . .

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

      What is tripping me up is that I have no immediate idea what $$r{$k} is doing

      It's the same thing as $r->{$k}; personally I sometimes prefer this style because it's a little bit shorter and my $r = {a=>1}; print $$r{a}; is very similar to my %r = (a=>1); print $r{a};. However, what usually trips people up is that this only works as long as the expression really is that simple (e.g. more complex values than $r), in which case the arrow dereferencing is the cleaner way to go.

Re: Tidy up json from nulls, empty arrays and hashes
by leszekdubiel (Scribe) on May 28, 2020 at 17:58 UTC

    Thank you for your hints.

    If $r is a hash ref, then $$r is a hash, then $$r{$k} is a value in that hash. This is of course the matter of style only.

    Yes -- I'm doing deep recursion. Example of use below:

    # echo '{"alfa": { "beta": null, "teata": null, "adfa": null, "n": {}, + "ixi": {}, "marr": [1, 2] } }' | ./tidyup_json { "alfa" : { "marr" : [ 1, 2 ] } } # echo '{"alfa": { "beta": null, "teata": null, "adfa": null, "n": {}, + "ixi": {}, "marr": [null, null] } }' | ./tidyup_json null # echo '{"alfa": { "beta": null, "teata": null, "adfa": null, "n": {}, + "ixi": {}, "marr": [null, null, 3] } }' | ./tidyup_json { "alfa" : { "marr" : [ 3 ] } } # echo '{"alfa": { "beta": { "gamma" : 3 } } }' | ./tidyup_json { "alfa" : { "beta" : { "gamma" : 3 } } } # echo '{"alfa": { "beta": { "gamma" : {} } } }' | ./tidyup_json null # echo '{"alfa": { "beta": { "gamma" : {} }, "nth": 9 } }' | ./tidyup_ +json { "alfa" : { "nth" : 9 } }
      > If $r is a hash ref, then $$r is a hash, then $$r{$k} is a value in that hash.

      That's not true, and that's why the syntax is confusing.

      $ perl -Mstrict -we 'my $r = {}; say $$r' Not a SCALAR reference at -e line 1.

      If $r is a hash ref, then %$r is a hash, then $$r{$k} is a value in that hash. Therefore, I prefer the $r->{$k} syntax.

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]