comment on

I often find myself trying to parse json api responses. I typically do a simple data dump of the result to get an understanding of how the data is structured. This is fine for easily digestible records but is frustrating when there is a lot of data accompanying the data structures. It's also difficult if the data is somewhat inconsistent where, for example, the email address record is missing if there is no email address.

So I set out to write a quick and dirty tool for generating a report on the data structure so I can quickly identify all the fields provided by the api response. This is what I have so far:

#! /usr/bin/env perl

use strict;
use warnings;

# test data
my @array = ({fname => 'bob',  last_name => 'smith'             },

             {fname => 'tony', last_name => 'jones', age => 23,
               kids =>
                 [
                   {first_name   => 'cheryl',
                    middle_name => 'karen',
                    age         => 23        },

                   {name         => 'jimmy',
                    age          => 17       }

                 ]                                                },
             {fname => 'janet', last_name => 'marcos',
               occupation => {
                 title => 'trucker',
                 years_on_job => 12}                              },


             {fname => 'Marge', last_name => 'Keefe',
                kids =>
                  [
                    {name => 'kate', age => 7, vaccinated => 'yes'},
                    {name => 'kim', age => 5}
                  ]
             });

my $out .= "var ";

my $out .= "var ";
my %giant_hash;

foreach my $array (@array) {
  %giant_hash = (%giant_hash, %$array);
}

my $s_values = 1;
introspect(\%giant_hash);

my $nest_level = 0;
# recursive function that traverses the data structure
sub introspect {
  my $data = shift;
  my $type = gtype ($data);

  if ($type eq 'ARRAY') {
    $nest_level++;
    $out .= "is an ARRAY with " . glen(@$data) . " elements:";
    my $count = 0;
    foreach my $elem (@$data) {
      $out .= "\n" . ("\t" x $nest_level) . "elem $count ";
      introspect (ref $elem ? $elem : \$elem);
      $count++;
    }
    $nest_level--;
  }

  if ($type eq 'HASH') {
    $nest_level++;
    $out .= "is a HASH with " . scalar (keys %$data) . " keys";
    $out .= "\n" . ("\t" x $nest_level) . "the keys are '" . join ("',
+ '", sort keys %$data) . "'";
    my $last_key;
    foreach my $key (keys %$data) {
      $last_key = $key;
      $out .= "\n" . ("\t" x $nest_level) . "key '$key' ";
      introspect (ref $data->{$key} ? $data->{$key} : \$data->{$key});
    }
    $nest_level--;
  }

  # our base case
  if ($type eq 'SCALAR') {
    $out .= "is a SCALAR";
    if (!$s_values) {
      $out .= " with a value of '$$data'";
    }
  }
}

print $out;
print "\n";

sub glen {
  return scalar @_;
}

sub gtype {
  ref shift;
}
[download]

This generates the following report:

var is a HASH with 5 keys
        the keys are 'age', 'fname', 'kids', 'last_name', 'occupation'
        key 'kids' is an ARRAY with 2 elements:
                elem 0 is a HASH with 3 keys
                        the keys are 'age', 'name', 'vaccinated'
                        key 'vaccinated' is a SCALAR
                        key 'age' is a SCALAR
                        key 'name' is a SCALAR
                elem 1 is a HASH with 2 keys
                        the keys are 'age', 'name'
                        key 'age' is a SCALAR
                        key 'name' is a SCALAR
        key 'fname' is a SCALAR
        key 'age' is a SCALAR
        key 'occupation' is a HASH with 2 keys
                the keys are 'title', 'years_on_job'
                key 'title' is a SCALAR
                key 'years_on_job' is a SCALAR
        key 'last_name' is a SCALAR
[download]

As can be seen, my crude approach of merging each element into a %giant_hash, while great for data if everything within the array hashes is a hash, falls down when arrays are encountered. Arrays within newly merged data structure clobber the arrays in data structures that were already merged into the giant hash. For example, the unique fields for kids for the "Tony Jones" record doesn't show up in the report. And this approach also won't work at all if I have an array of arrays.

The second approach, which I gave up on, was a bit too mind bending for me to figure out. Basically, each element of the array of hashes gets traversed individually. The first element traversed would populate a %meta hash which would act as a reference for the other data structures in the outermost array. Each traversal of subsequent elements build upon %meta and make it more and more accurate as each leaf in the data structures get manually merged into %meta. Conceptually, the end result of %meta would look something like this:

fname => scalar,
last_name => scalar,
kids => aoh => { first_name => scalar, middle_name => scalar, age => s
+calar, vaccinated => scalar},
occupation => hoh => {title => scalar, years_on_job => scalar},
age => scalar
[download]

any tips or hints are appreciated.

$PM = "Perl Monk's";
$MCF = "Most Clueless ~~Friar~~ ~~Abbot~~ ~~Bishop~~ ~~Pontiff~~ ~~Deacon~~ ~~Curate~~ Priest";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks

In reply to How to improve introspection of an array of hashes by nysus

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.