Re: polymorphic data-driven rendering?

I just finished writing such a general purpose transformation tool (in Perl, currently being tested).

Although it is possible to write such a tool, it can't eliminate DRY violations unless your development environment completely separates data structure from output format. To do that one must have the following two components, in addition to the general purpose converter:

transformation routines. For each object you want to convert, you will need to write a routine that converts each object you want to dump into some basic Perl data structure that is easily dumped by all of the output formats (XML, JSON, YAML, SOAP, binary dumps, etc) you want to use.
format usage rules. You will need to formulate policies for how you will use each serialization format so that the standard structure above can be used to read and write to your chosen formats (XML, JSON, YAML, etc) with no knowledge of the original (pre-transformed) data you are dumping

Both of the above are necessary if you want to completely separate the data structure from its output format. Dump formats vary widely in the complexity of data structures they can support. The CPAN modules that do the dumping vary even more. Data::Table and AnyData can dump to so many formats only because they limit themselves to shallow (unnested) data structures that can easily be translated to rows and columns. Even then, they adopt some conventions about how they will use each format. Although they give you a few configuration options for each output format, you are generally limited to a subset of the serialization format's usage patterns.

Designing transformation routines and format usage policies is non-trivial and the two tasks need to be done in tandem. The policy is also going to be dependent on the family of formats you wish to support. Suppose you want to dump a blessed object to XML, JSON, and YAML. YAML handles blessed objects gracefully and has specific syntax to support it. Neither XML or JSON do.

If you want to dump a blessed object into XML, you'll need to set up some sort of convention - but there are lots of ways to do it: (a) reserve a tag name or attribute to capture the class assignment (and make sure no real data uses that name) (b) wrap each object belonging to class Foo in <foo> tags (c) leave class information out of the dump file entirely. Instead use a file naming convention to capture the class. (d) do nothing special in the XML. Instead your transformation routine adds each foo object to a hash whose keys are the class name and whose values are an array of objects belonging to that class. And that is just a small sample of your options.

JSON is even more complicated. Unlike XML,it doesn't have facilities to roll your own syntax. Worse yet, if left unconfigured, it only knows how to handle data wrapped in array and hash references. The dump routines of JSON will fail if you pass them blessed objects or pure scalars unless you specially configure the routines to accept such data. Even if you configure JSON to accept such data, you will still need to provide each class with a TO_JSON method. You can avoid hand-writing those TO_JSON routines by configuring JSON to use a universal dumper routine, but the default implementation of that routine strips out class information. If you need to load objects that contain lists and hashes with yet other objects, this can get rather messy unless you have written a loss-less transformation rule that converts all blessed objects (however deeply nested) into unblessed objects.

At the end of the day, all that a general purpose tool is really giving you is a framework for managing and coordinating your transformation routines and serialization policies. If done right, it may also be able to take some of the grunt work out of developing the transformation routines, especially for deeply nested structures. (That was the goal of the module I wrote).

A second issue concerns testing of such general purpose conversion engines. They don't require much code, but they are quite complex to test. There are some important "gotchas" that need to be considered when evaluating implementations:

implementations can end up in infinite loops unless they handle circular references properly
implementations can destroy the integrity of graphs unless the transformation routine preserves the fact that objects X1 and X2 both reference object Y somewhere deep in their data guts.
there are many, many paths through the code necessitating not 20 or 30 but hundreds or even thousands of tests

Best, beth

Comment on Re: polymorphic data-driven rendering? Download Code


Just another Perl shrine
	PerlMonks