in reply to How to improve introspection of an array of hashes

Nice :).

First:

my $out .= "var "; my $out .= "var ";
What what? And please don't name your hashrefs $array ;-).

Just so we're clear. Your idea will work when the keys identify uniquely the associated values, so something like this would be valid perl, but invalid input:

(Shapes => [ { Type => 'Circle', Diameter => 2, Center => [0,1] }, { Type => 'Square', Side => 3, Pos => [4,8] }, [ {x => 1, y => 1}, {x => 3, y => 1}, {x => 4, y => 2}, {x => 2, y + => 2} ] ] );
The values in the Shapes array can either be hashes describing the shape, or an array of points, and you need to look inside the hashes to know the type of shape and the associated members.
Edit: actually my solution below accepts mixed ARRAY/HASH. It just melds Circle and Square together in a way that might not make much sense.

As can be seen, my crude approach of merging each element into a %giant_hash, while great for data if everything within the array hashes is a hash, falls down when arrays are encountered.
Actually the hash is the issue, in %giant_hash = (%giant_hash, %$array); if there are keys in %$array that are already present in %giant_hash, they will overwrite them. This means that the kids of Marge Keefe will erase the kids of Tony Jones.

A hash in perl can only hold a single value (scalar) for each key. That value can be a reference that holds other values, but perl won't just do that on its own when you "merge" hashes, so in %giant_hash, you will only have the fname, last_name, occupation and set of kids. So you can't actually merge the hash that way before iterating over it.

The reason you merged the hashes in the first place is that they are of the same type, so contain similar data. That's also true for the kids (basically they have a name, an age, and might be vaccinated), so you should "merge" them, and show that "kids" may contrary may contain an arbitrary number of elements of type "kid". Which would make your output look like:

var is a HASH with 5 keys the keys are 'age', 'fname', 'kids', 'last_name', 'occupation' key 'kids' is an ARRAY containing HASHREFs: the keys are 'age', 'name', 'vaccinated' key 'vaccinated' is a SCALAR key 'age' is a SCALAR key 'name' is a SCALAR key 'fname' is a SCALAR key 'age' is a SCALAR key 'occupation' is a HASH with 2 keys the keys are 'title', 'years_on_job' key 'title' is a SCALAR key 'years_on_job' is a SCALAR key 'last_name' is a SCALAR

Here is my attempt at solving your problem. There are of course many ways to do it, keeping the list of keys down to the current point rather than a reference to the current level might be a better way to work (you don't have to provide the output hash as a parameter), but I just went where my fingers took me :D

use v5.14; use strict; use warnings; use Data::Dump qw( pp ); use YAML; sub introspect { my ($data, $output) = @_; if (ref $data eq 'ARRAY') { my $sub_out = ($output->{'ARRAY'} //= {}); introspect($_, $sub_out) for @{ $data }; } elsif (ref $data eq 'HASH') { my $hash_out = $output->{"HASH"} //= {}; for my $key (keys %$data) { my $sub_out = ($hash_out->{"$key"} //= {}); introspect($_, $sub_out) for $data->{$key}; } } elsif (ref $data) { $output->{ref($data).'REF'}=1; } else { $output->{SCALAR}=1; } } my @array = ({fname => 'bob', last_name => 'smith', foo => [\*main]}, {fname => 'tony', last_name => 'jones', age => 23, kids => [ {first_name => 'cheryl', middle_name => 'karen', age => 24 }, {name => 'jimmy', age => 17 } ], }, {fname => 'janet', last_name => 'marcos', foo => {}, occupation => { title => 'trucker', years_on_job => 12} }, {fname => 'Marge', last_name => 'Keefe', kids => [ {name => 'kate', age => 7, vaccinated => 'yes'}, {name => 'kim', age => 5} ] }); my %out; introspect(\@array, \%out); say pp \%out; say YAML::Dump(\%out);
{ ARRAY => { HASH => { age => { SCALAR => 1 }, fname => { SCALAR => 1 }, foo => { ARRAY => { GLOBREF => 1 }, HASH => {} }, kids => { ARRAY => { HASH => { age => { SCALAR => 1 }, first_name => { SCALAR => 1 }, middle_name => { SCALAR => 1 }, name => { SCALAR => 1 }, vaccinated => { SCALAR => 1 }, }, }, }, last_name => { SCALAR => 1 }, occupation => { HASH => { title => { SCALAR => 1 }, years_on_job => { SCALAR = +> 1 } }, }, }, }, } --- ARRAY: HASH: age: SCALAR: 1 fname: SCALAR: 1 foo: ARRAY: GLOBREF: 1 HASH: {} kids: ARRAY: HASH: age: SCALAR: 1 first_name: SCALAR: 1 middle_name: SCALAR: 1 name: SCALAR: 1 vaccinated: SCALAR: 1 last_name: SCALAR: 1 occupation: HASH: title: SCALAR: 1 years_on_job: SCALAR: 1

Edit: you can add this case to handle things like \\\\\{};

elsif (ref $data eq 'REF') { introspect($$data, ($output->{'REF'} //= {})); }

Replies are listed 'Best First'.
Re^2: How to improve introspection of an array of hashes
by nysus (Parson) on Sep 13, 2018 at 14:33 UTC

    Perfect! Very elegant. I will study this closely. And nice use of Dumper and yaml to do the work of formatting the output.

    Do you think this might be useful as a cpan module? I searched cpan but didn't find anything that did anything quite like this.

    $PM = "Perl Monk's";
    $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest";
    $nysus = $PM . ' ' . $MCF;
    Click here if you love Perl Monks

      YAML is often my go-to module when I want formatted data but I'm too lazy to do it myself :). I was hoping for compacter data with YAML than Data::Dump though .But the latter has inline { SCALAR => 1 } where YAML puts it in a separate line.

      Do you think this might be useful as a cpan module?
      Maybe? It needs some tinkering though (or rewrite). Like a wrapper to hide the %output hash. And proper handling of objects: right now inspecting bless {}, 'Pony' would be indicated as 'PonyREF' and bless {}, 'ARRAY' would try to dereference the hashref as an arrayref. Oups.

      I'd still be curious to see what others might have to say about the subject. I wouldn't be surprised if there is already a data traversing module that, rather than do what you want already, let's you do it in two to three lines.

        I'm sure someone has done something like this as well. Just couldn't find it.

        I took your code for a spin in the real world using Google Contacts API. Here's the output from a json response converted to a Perl data structure using Mojo::JSON::decode_json:

        HASH => { encoding => {}, feed => { HASH => { "author" => { ARRAY => { HASH => { email => { HASH => { "\\$t" => {} } }, name => { HASH => { "\\$t" => {} } }, }, }, }, "category" => { ARRAY => { HASH => { scheme => {}, term => {} +} } }, "entry" => { ARRAY => { HASH => { "app\\$edited" => { HASH => { "\\$t" => {}, "xmlns\\$app +" => {} } }, "category" => { ARRAY => { HASH => { scheme => {}, term +=> {} } } }, "content" => { HASH => { "\\$t" => {} } }, "gContact\\$birthday" => { HASH => { when => {} } }, "gContact\\$groupMembershipInfo" => { ARRAY => { HASH => + { deleted => {}, href => {} } } }, "gContact\\$nickname" => { HASH => { "\\$t" => {} } }, "gContact\\$relation" => { ARRAY => { HASH => { "\\$t" = +> {}, "rel" => {} } } }, "gContact\\$userDefinedField" => { ARRAY => { HASH => { +key => {}, value => {} } } }, "gContact\\$website" => { ARRAY => { HASH => { href => {}, label => {}, primary +=> {}, rel => {} } }, }, "gd\\$email" => { ARRAY => { HASH => { address => {}, label => {}, primary => {}, + rel => {} }, }, }, "gd\\$etag" => {}, "gd\\$extendedProperty" => { ARRAY => { HASH => { "\\$t" + => {}, "name" => {} } } }, "gd\\$im" => { ARRAY => { HASH => { address => {}, label => {}, primary => {}, + protocol => {}, rel => {} }, }, }, "gd\\$name" => { HASH => { "gd\\$additionalName" => { HASH => { "\\$t" => {} } +}, "gd\\$familyName" => { HASH => { "\\$t" => {}, " +yomi" => {} } }, "gd\\$fullName" => { HASH => { "\\$t" => {} } +}, "gd\\$givenName" => { HASH => { "\\$t" => {}, " +yomi" => {} } }, "gd\\$namePrefix" => { HASH => { "\\$t" => {} } +}, "gd\\$nameSuffix" => { HASH => { "\\$t" => {} } +}, }, }, "gd\\$organization" => { ARRAY => { HASH => { "gd\\$orgDepartment" => { HASH => { "\\$t" => {} } + }, "gd\\$orgName" => { HASH => { "\\$t" => {} } }, "gd\\$orgTitle" => { HASH => { "\\$t" => {} } }, "primary" => {}, "rel" => {}, }, }, }, "gd\\$phoneNumber" => { ARRAY => { HASH => { "\\$t" => {}, "label" => {}, "primary" => +{}, "rel" => {}, "uri" => {} }, }, }, "gd\\$structuredPostalAddress" => { ARRAY => { HASH => { "gd\\$city" => { HASH => { "\\$t" => { +} } }, "gd\\$country" => { HASH => { "\\$t" => { +}, "code" => {} } }, "gd\\$formattedAddress" => { HASH => { "\\$t" => { +} } }, "gd\\$postcode" => { HASH => { "\\$t" => { +} } }, "gd\\$region" => { HASH => { "\\$t" => { +} } }, "gd\\$street" => { HASH => { "\\$t" => { +} } }, "primary" => {}, "rel" => {}, }, }, }, "id" => { HASH => { "\\$t" => {} } }, "link" => { ARRAY => { HASH => { "gd\\$etag" => {}, "href" => {}, "rel" => +{}, "type" => {} }, }, }, "title" => { HASH => { "\\$t" => {} } }, "updated" => { HASH => { "\\$t" => {} } }, }, }, }, "gd\\$etag" => {}, "generator" => { HASH => { "\\$t" => {}, "uri" => {}, "version +" => {} } }, "id" => { HASH => { "\\$t" => {} } }, "link" => { ARRAY => { HASH => { href => {}, rel => {}, type = +> {} } } }, "openSearch\\$itemsPerPage" => { HASH => { "\\$t" => {} } }, "openSearch\\$startIndex" => { HASH => { "\\$t" => {} } }, "openSearch\\$totalResults" => { HASH => { "\\$t" => {} } }, "title" => { HASH => { "\\$t" => {} } }, "updated" => { HASH => { "\\$t" => {} } }, "xmlns" => {}, "xmlns\\$batch" => {}, "xmlns\\$gContact" => {}, "xmlns\\$gd" => {}, "xmlns\\$openSearch" => {}, }, }, version => {}, },

        I modified the code slightly to get rid of the "SCALAR" output.

        $PM = "Perl Monk's";
        $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest";
        $nysus = $PM . ' ' . $MCF;
        Click here if you love Perl Monks