mwb613 has asked for the wisdom of the Perl Monks concerning the following question:

Apologies for the imprecise question here. I am trying to approach something I've been programming against from a higher level. I have an API written (in Dancer2) which accepts parameters in JSON and does some CRUD operations. I have written a few different data verifiers that have so far been inadequate for covering different incoming JSON objects and they don't really hold up across all of the different data structures that come in to the API.

Is there a module or method for writing an object template (whether in JSON or in plain Perl) that I can use to verify that data coming into the API matches a flat or nested Perl data structure?

For example, let's say there is an API method which returns a list of ZIP codes matching a US state. The incoming JSON would look like this:

{ "token":"blah", "us_states":["NJ","MI"] }

What I would like to do is write a closure that is given an "object model" or "data structure" and returns a function that can verify the incoming data matches the given model. An input to the closure might look like this:

{ 'token' => &verify_token, 'us_states' => &verify_state_array, }

I say it would be a closure because the API provides a bunch of different methods for querying information about a state and therefore there will be a multiple methods that need to verify the "us_states" array values and they can use the verify_state_hash function.

For nested incoming data it would be great to be able to traverse the structure and match input keys to ready-made verification functions that are reusable (switched back to Perl format for data structures for some reason):

{ 'token' => 'blah', 'new_exchanges' => [ {'NJ' => [{'201' => ['200','202']},{'202' => ['200','201']}]}, {'MA' => [{'617' => ['200','202']},{'949' => ['200','201']}]}, ] }

(I accept that the data is formatted in a very dumb way in this example, I was just trying to show a well-nested data structure).

Again, apologies for the meandering nature of this post I think this is something I could write myself but I am concerned with the edge cases when traversing a data structure where I cannot control what is received by the API and it seems there might be a module already created to do something like this. Or perhaps I'm going down a terrible path anyways and someone could gently nudge me back on course with a suggestion. To be clear I'm not looking for anyone to write any code for me, just trying to figure out a strategy for resolving what must be a common problem for munging incoming data.

Thanks!

Replies are listed 'Best First'.
Re: Data Object Verification Modules?
by tobyink (Canon) on Jun 06, 2018 at 22:18 UTC

    Data::Processor is pretty close to what you describe. Though I think you might find Types::Standard does what you want in a slightly different way.

    use Types::Standard -types; my $state = StrMatch[qr/^[A-Z]{2}$/]; my $token = Str->where(sub { $database->check_valid_token($_) }); my $thing = Dict[token => $token, state => ArrayRef[$state]]; if ($thing->check($data)) { ...; }

      These look like these modules are right on what I'm looking for. Hopefully they'll introduce me to some ideas on a larger scope which will help as well.

      Thanks!

Re: Data Object compares?
by perl-diddler (Chaplain) on Jun 06, 2018 at 20:23 UTC
    I had a need for something similar yesterday and throw together a compare routine that recursively walks down the structure. and returns my idea of < = or >, even though I only needed it for determining if 2 structs were the same.
    sub isnum($) { $_[0] =~ m{^\s* [-+\d.]* [\d.]+ (?:e[-+]\d+)? \s* $}x } sub Cmp (;$$$); sub Cmp (;$$$) { my $r=0; require P; my ($a, $b, $d) = @_ ? @_ : ($a, $b); my ($ra, $rb) = (ref $a, ref $b); my ($ta, $tb) = (typ $a, typ $b); P::Pe("ta=%s, tb=%s", $ta, $tb) if $d; P::Pe("ra=%s, rb=%s", $ra, $rb) if $d; my ($dta, $dtb) = (defined $ta, defined $tb); # first handle "values" (neither are a type reference) unless($dta || $dtb) { $r = isnum($a) && isnum($b) ? $a <=> $b : $a cmp $b; P::Pe("isnum, a=%s, b=%s, r=%s", isnum($a), isnum($b), $r) if $d; return $r } # then handle unequal type references elsif ($dta ^ $dtb) { return (undef, 1) } elsif ($dta && $dtb && $ta ne $tb) { return (undef, 2) } # now, either do same thing again, or handle differing classes # the no-class on either implies no type-ref on either & is handled +above my ($dra, $drb) = (defined $ra, defined $rb); if ($dra ^ $drb) { return (undef, 3) } elsif ($dra && $drb && $ra ne $rb) { return (undef, 4) } # now start comparing references: dereference and call Cmp again if ($ta eq SCALAR) { return Cmp($$a, $$b) } elsif ($ta eq ARRAY) { P::Pe("len of array a vs. b: (%s <=> %s)", @$a, @$b) if $d; return $r if $r = @$a <=> @$b; # for each member, compare them using Cmp for (my $i=0; $i<@$a; ++$i) { P::Pe("a->[i] Cmp b->[i]...\0x83", $a->[$i], $b->[$i]) if $d; $r = Cmp($a->[$i], $b->[$i]); P::Pe("a->[i] Cmp b->[i], r=%s", $a->[$i], $b->[$i], $r) if $d; return $r if $r; } return 0; # arrays are equal } elsif ($ta eq HASH) { my @ka = sort keys %$a; my @kb = sort keys %$b; $r = Cmp(0+@ka, 0+@kb); P::Pe("Cmp #keys a(%s) b(%s), in hashes: r=%s", 0+@ka, 0+@kb, $r) +if $d; return $r if $r; $r = Cmp(\@ka, \@kb); P::Pe("Cmp keys of hash: r=%s", $r) if $d; return $r if $r; my @va = map {$a->{$_}} @ka; my @vb = map {$b->{$_}} @kb; $r = Cmp(\@va, \@vb); P::Pe("Cmp values for each key, r=%s", $r) if $d; return $r; } else { P::Pe("no comparison for type %s, ref %s", $ta, $ra) if $d; return (undef,5); ## unimplemented comparison } }
    Please note, it's raw code. It works on the nested data structures I've tried it on, but I haven't developed any general test cases for it -- and am not sure if I'd want to put it on cpan and if so, where. For now, I added it to my Types::Core module, as it's comparing typed data (a tenuous reason, but with it so small, and not sure where else I'd put it...eh(?)). If you decide to use it, PLEASE tell me about any bugs/problems, so I can develop tests and upgrade the code, but I just wrote it yesterday and don't even know if I want to publish it. There may be similar modules in CPAN, but I wanted something short & sweet and this did exactly what I wanted.

    Takes up to 3 params: 1st two are refs to the data structures. If passed no refs, it will use '$a and $b' as starting points (global compare vars). Third param '$d' stands for debug and controls the printing of various progress messages as it goes along.

    Literally, I'm using to test to see if some routines internal to a program generate correct results. I.e. the code generating the routines was complicated enough, that I wanted to test it separately -- calling the routines and having them generate various data structures. I needed a way to compare structures that should be equal.

    It sounded like you were wanting exactly the same thing I was doing, if not, sorry for the waste of bandwidth and misunderstanding what you wanted, but if it works for you, cool!

    Linda

      A few more things forgot to mention, about the return values: normally, if the structures are able to be compared, it will return -1, 0 or +1 meaning struct on left compared 'less', or equal or struct on rt.

      The ordering of structures is definitely *arbitrary* or *subjective* -- I compare keys of hashes, for example, but whether or not that ordering is relevant or pertinent is entirely arbitrary -- as mentioned before, I really wanted to know if the structures were equal or not, but I figured, I might as well try *some* ordering and get some side benefit of possibly being able to sort data-structs into some order.

      For hashes, I then pull up the values for the keys and sort those and compare those.

      If you get back an 'undef', it means it couldn't compare it -- not that they were unequal(or equal).

      Even if the underlying data structures are the same -- if they are blessed data structs in different classes, they return undef.

      On return values, I use a 2 element array with the 2nd element being a number pointing at the test in the Cmp that failed. That helped me narrow down problems and such. They may or may not be useful.

Re: Data Object Verification Modules?
by 1nickt (Canon) on Jun 08, 2018 at 11:05 UTC

    An alternative to tobyink's Type::Tiny would be to use JSONSchema (also maintained by tobyink (!)). For a full-scale API it's a bit of work to set up, but very easy to maintain and very powerful. If your Dancer2 API accepts only JSON params it's the way to go in my experience.

    See the tutorial examples.

    (PS: There is an alternative, JSON::Validator, by one of the lead Mojo devs, but I have not used it.)

    Hope this helps!


    The way forward always starts with a minimal test.

      Urgh. Replied to this earlier, but I guess I only got as far as the preview and then closed the tab or something. SSL errors in Chrome are making this site very annoying to use.

      Anyway, I probably wouldn't recommend JSON::Schema at the moment. It's based on a pretty old version of the JSON schema specification, and I don't plan on updating it until the specification stabilises. And when that does happen, JSON::Schema is likely to become a wrapper around JSON::Schema::AsType which is a frickin' awesome idea, and how I'd be implementing JSON Schema now if someone hadn't beaten me to it.

      Right now, JSON::Schema::AsType is a little inefficient in how it uses Type::Tiny, and could be made quite a bit faster. I plan on contributing some improvements in this area once the specification is more stable. It may already be faster than JSON::Schema though — I haven't benchmarked it.