I often find myself trying to parse json api responses. I typically do a simple data dump of the result to get an understanding of how the data is structured. This is fine for easily digestible records but is frustrating when there is a lot of data accompanying the data structures. It's also difficult if the data is somewhat inconsistent where, for example, the email address record is missing if there is no email address.

So I set out to write a quick and dirty tool for generating a report on the data structure so I can quickly identify all the fields provided by the api response. This is what I have so far:

#! /usr/bin/env perl use strict; use warnings; # test data my @array = ({fname => 'bob', last_name => 'smith' }, {fname => 'tony', last_name => 'jones', age => 23, kids => [ {first_name => 'cheryl', middle_name => 'karen', age => 23 }, {name => 'jimmy', age => 17 } ] }, {fname => 'janet', last_name => 'marcos', occupation => { title => 'trucker', years_on_job => 12} }, {fname => 'Marge', last_name => 'Keefe', kids => [ {name => 'kate', age => 7, vaccinated => 'yes'}, {name => 'kim', age => 5} ] }); my $out .= "var "; my $out .= "var "; my %giant_hash; foreach my $array (@array) { %giant_hash = (%giant_hash, %$array); } my $s_values = 1; introspect(\%giant_hash); my $nest_level = 0; # recursive function that traverses the data structure sub introspect { my $data = shift; my $type = gtype ($data); if ($type eq 'ARRAY') { $nest_level++; $out .= "is an ARRAY with " . glen(@$data) . " elements:"; my $count = 0; foreach my $elem (@$data) { $out .= "\n" . ("\t" x $nest_level) . "elem $count "; introspect (ref $elem ? $elem : \$elem); $count++; } $nest_level--; } if ($type eq 'HASH') { $nest_level++; $out .= "is a HASH with " . scalar (keys %$data) . " keys"; $out .= "\n" . ("\t" x $nest_level) . "the keys are '" . join ("', + '", sort keys %$data) . "'"; my $last_key; foreach my $key (keys %$data) { $last_key = $key; $out .= "\n" . ("\t" x $nest_level) . "key '$key' "; introspect (ref $data->{$key} ? $data->{$key} : \$data->{$key}); } $nest_level--; } # our base case if ($type eq 'SCALAR') { $out .= "is a SCALAR"; if (!$s_values) { $out .= " with a value of '$$data'"; } } } print $out; print "\n"; sub glen { return scalar @_; } sub gtype { ref shift; }

This generates the following report:

var is a HASH with 5 keys the keys are 'age', 'fname', 'kids', 'last_name', 'occupation' key 'kids' is an ARRAY with 2 elements: elem 0 is a HASH with 3 keys the keys are 'age', 'name', 'vaccinated' key 'vaccinated' is a SCALAR key 'age' is a SCALAR key 'name' is a SCALAR elem 1 is a HASH with 2 keys the keys are 'age', 'name' key 'age' is a SCALAR key 'name' is a SCALAR key 'fname' is a SCALAR key 'age' is a SCALAR key 'occupation' is a HASH with 2 keys the keys are 'title', 'years_on_job' key 'title' is a SCALAR key 'years_on_job' is a SCALAR key 'last_name' is a SCALAR

As can be seen, my crude approach of merging each element into a %giant_hash, while great for data if everything within the array hashes is a hash, falls down when arrays are encountered. Arrays within newly merged data structure clobber the arrays in data structures that were already merged into the giant hash. For example, the unique fields for kids for the "Tony Jones" record doesn't show up in the report. And this approach also won't work at all if I have an array of arrays.

The second approach, which I gave up on, was a bit too mind bending for me to figure out. Basically, each element of the array of hashes gets traversed individually. The first element traversed would populate a %meta hash which would act as a reference for the other data structures in the outermost array. Each traversal of subsequent elements build upon %meta and make it more and more accurate as each leaf in the data structures get manually merged into %meta. Conceptually, the end result of %meta would look something like this:

fname => scalar, last_name => scalar, kids => aoh => { first_name => scalar, middle_name => scalar, age => s +calar, vaccinated => scalar}, occupation => hoh => {title => scalar, years_on_job => scalar}, age => scalar

any tips or hints are appreciated.

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks


In reply to How to improve introspection of an array of hashes by nysus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.