Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

polymorphic data-driven rendering?

by metaperl (Curate)
on Mar 28, 2009 at 14:38 UTC ( [id://753856]=perlquestion: print w/replies, xml ) Need Help??

metaperl has asked for the wisdom of the Perl Monks concerning the following question:

Has anyone written a general package to do what Data::Table does for tables?

That is, given a data structure, provide a fully object-oriented method (hence customizable) that renders the data structure to any number of common formats such as HTML, Markdown, Excel, etc.

The problem with all template solutions to date is you must write the template code in each output format... this is a violation of the DRY principle.

Replies are listed 'Best First'.
Re: polymorphic data-driven rendering?
by Corion (Patriarch) on Mar 28, 2009 at 14:56 UTC

    It seems to me you're lost in a maze of abstract factory factory blueprints, all different.

    Unless you specify a bit more concrete what kind of data you have and how it should be represented in general, you cannot get any meaningful answer.

    A table has a traditionally well-defined structure. It consists of a grid layout, with the first row in the grid being called the "header", usually containing the names of the columns. Every subsequent row then contains values.

    As soon as you find another data structure, like, say, a (directed) Graph, you will likely also find modules that will render the graph in a way more or less abstracted, into a sensible output format. For example, it makes little sense to render image data into the Excel file format, or tabular data into text wrapped to 40 character columns.

    Looking for a generic way here will only bring you into the territory of XSLT or other meaningless abstract ways to manipulate data structures into other data structures. Most of these transformations amount to programming things yourself, which is why most people just program these transformations directly in Perl instead of looking for a crutch that limits their expressiveness.

Re: polymorphic data-driven rendering?
by BrowserUk (Patriarch) on Mar 29, 2009 at 10:14 UTC

    It'll probably comes as no surprise that I am very skeptical of the benefits of trying to over generalise in this way. I'll try to make my case for why.

    By analogy. It could be postulated that the steering wheels on cars; the handlebars on a (motor)bikes; the control columns on commercial aircraft; the joysticks on military jets; the tillers on small boats; even the rains on horses and horse-drawn carriages; all serve the same purpose and could be replaced with a single control.

    It is probably technically feasible to attach servos to the bit in a horses mouth and have the rider use a joystick to steer, but it is probably overkill.

    Conversely, I'm not sure I'd want to fly in a 777 if the pilot used a pair of leather straps or big tiller to steer it.

    Whilst that analogy is jokey, don't take it for a joke. Perhaps the single hardest element of modern software design to get right is the abstraction. And by far the biggest mistake in recent years is over abstraction. It is far too easy to get carried away with finding abstract similarities between things that have no business being conflated.

    By example. A few years ago, I worked on the periphery of an MIS system for a large retail chain. This system was heavily OO layered over an RDBMS. Within it, everybody--personnel, suppliers, customers et al.--were instances of a Person class. Which mapped to one large primary table in the DB with lots of specialisations (FKs) hanging off of it. The problems came when trying to control access to it. Which came to a head when a minor programming error lead to sensitive information about customers being sent out to a supplier.

    Putting all your data in one place may sound like a great idea from the data warehousing/analysis perspective, but security-aware organisations use compartmentalisation for very good reasons. It may lead to apparent redundancies, but it also leads to "redundant" layers of security, which you'll be very glad of when one of the layers is breached.

    Do not let theoretical principles override pragmatism and practicality without serious consideration on a case by case basis.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I upvoted this node because, in the main, I think you make an excellent point about the positive value of redundancy. However, I think when you apply this argument to generalized data conversion, you are setting up a straw man.

      I've built and designed systems with intentional redundancy, usually for reasons of security and auditing, just as you discussed above. However, the redundancy is in the data, not the code that processes it. Perhaps we just read the OP differently, but I think OP was bothered by the redundancy in the template code, not the redundancy in the data it produced.

      Whether in templates or Perl code, there is a lot of coding redundancy when you navigate complex data structures to produce custom per-object dump routines for N different formats. Redundant code attracts bugs like honey does bees. Even if you fix a bug in one place, you still have to go out and find all the places where that bug is repeated. This search process is highly prone to human error.

      Looking at the specific example you gave (Person-everything) I would argue that lack of redundancy was not the problem, but rather a mismatch between the design and the code/architecture used to process the data. If you are going to fracture objects among multiple tables, you also need to insure that you have a proper mechanism for reassembling the objects and applying security based on view/Person "subclass". Furthermore the mechanism needs to be implemented via stored procedures, triggers and restricted views of the data. Robust transaction support is also essential.

      If your database does not support such a mechanism and you can't write an extension to the DBMS that does support it, then you are guarenteed to have security problems someday, somewhere. You have no choice but to implement the reassembly and security mechanism outside of the DBMS and your security is totally dependent on the good behavior of the applications that access the database.

      Best, beth

        Have you looked inside Data::Table?

        Have you considered the complexity of table formatting options?

        As a starting point, consider only those options (currently available and not) for html output.

        Now consider trying to provide transparent mapping of all those options so that they can be applied to Excel output. And now try adding all the formatting possibilities that can be applied to Excel cells, groups, tables and graphs, and retro fitting them back into the existing API such that they can transparently be applied to the HTML output. And also to CSV/TSV/Formatted ASCII.

        And for bonus point, now try and do the same thing for all the formatting possibilities that can be applied to PDFs, and retro-fitting those options into the modules interface such that they can be applied (even if they do nothing!), to the HTML/Excel/CSV/TSV/Formatted ASCII.

        And if you've got through those mental exercises, now consider adding support for RTF; and OpenOffice XML Document Format; and Doxygen; and Wiki; and ...

        And then imagine trying to perform comprehensive testing of all of those.

        And finally, imagine trying to write good user documentation for the behemoth that would result.

        And all that complexity arises for what is ostensibly a very simple and easily defined input format.

        Now imagine trying to do the same thing for generic data structures: Hashes and arrays seems trivial, until you start nesting them. How about handling circular references? What about dealing with Moose objects with all their possibilities for annotations. And derivation from any of: simple blessed hash-based classes; or blessed array-based classes; or blessed scalar-based classes; or any of a dozen different implementations of Inside-out object based classes?

        The problem with what the OP is asking for--"given a data structure, provide a fully object-oriented method ... that renders the data structure to any number of common formats such as HTML, Markdown, Excel, etc."--is that it creates a bottleneck. Or perhaps 'tourniquet' is a better word here.

        Many input formats: given a data structure; transmutating to many output formats: HTML, Markdown, Excel, etc. through a single API. There are 3 ways to go when you try to do things like that:

        1. Lowest common denominator.

          Think POD.

          Supports rendition to every output format known to man.

          But only because it supports almost none of the facilities of any of them.

        2. Fully comprehensive, extensible, virtualised API.

          Think XML.

          Sets out to be able to encode anything, originating in any format, into a format that any complaint reader can read. And it achieves it.

          But only because it moves all the meta-data into another document (and format) entirely: namely the DTD. And that format is so specific and so rigid, that no two are ever compatible. They basically dodged the problem but pushing it upstream, with the result that they achieved nothing.

        3. Minimum working subset with generalised escapes.

          The best common example of this are filesystem APIs and the IOCTL calls.

          The common subset work--mostly; don't look too closely at the filesystems on CDs for example--most places, but at the expense of restricting innovation to an 'also ran' status.

          Ask yourself, why do most *nix systems still default to using Ext2 when there are much better one like Reifer available?

          Or why nobody makes use of the Streams facilities of NTFS?

          Once you define a minimal common subset of facilities, innovation is stifled to the point that it is extremely rare that that subset is ever extended. And that results in stagnation; or hookey proprietary extension mechanisms that never converge.

          Another good example of this is SQL. The movement in the standardised adoption of modern extensions to SQL is glacial.

          Such standards bring compatibility and interoperability; but they also bring uniformity and stagnation.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: polymorphic data-driven rendering?
by ELISHEVA (Prior) on Mar 29, 2009 at 08:11 UTC

    I just finished writing such a general purpose transformation tool (in Perl, currently being tested).

    Although it is possible to write such a tool, it can't eliminate DRY violations unless your development environment completely separates data structure from output format. To do that one must have the following two components, in addition to the general purpose converter:

    • transformation routines. For each object you want to convert, you will need to write a routine that converts each object you want to dump into some basic Perl data structure that is easily dumped by all of the output formats (XML, JSON, YAML, SOAP, binary dumps, etc) you want to use.
    • format usage rules. You will need to formulate policies for how you will use each serialization format so that the standard structure above can be used to read and write to your chosen formats (XML, JSON, YAML, etc) with no knowledge of the original (pre-transformed) data you are dumping

    Both of the above are necessary if you want to completely separate the data structure from its output format. Dump formats vary widely in the complexity of data structures they can support. The CPAN modules that do the dumping vary even more. Data::Table and AnyData can dump to so many formats only because they limit themselves to shallow (unnested) data structures that can easily be translated to rows and columns. Even then, they adopt some conventions about how they will use each format. Although they give you a few configuration options for each output format, you are generally limited to a subset of the serialization format's usage patterns.

    Designing transformation routines and format usage policies is non-trivial and the two tasks need to be done in tandem. The policy is also going to be dependent on the family of formats you wish to support. Suppose you want to dump a blessed object to XML, JSON, and YAML. YAML handles blessed objects gracefully and has specific syntax to support it. Neither XML or JSON do.

    If you want to dump a blessed object into XML, you'll need to set up some sort of convention - but there are lots of ways to do it: (a) reserve a tag name or attribute to capture the class assignment (and make sure no real data uses that name) (b) wrap each object belonging to class Foo in <foo> tags (c) leave class information out of the dump file entirely. Instead use a file naming convention to capture the class. (d) do nothing special in the XML. Instead your transformation routine adds each foo object to a hash whose keys are the class name and whose values are an array of objects belonging to that class. And that is just a small sample of your options.

    JSON is even more complicated. Unlike XML,it doesn't have facilities to roll your own syntax. Worse yet, if left unconfigured, it only knows how to handle data wrapped in array and hash references. The dump routines of JSON will fail if you pass them blessed objects or pure scalars unless you specially configure the routines to accept such data. Even if you configure JSON to accept such data, you will still need to provide each class with a TO_JSON method. You can avoid hand-writing those TO_JSON routines by configuring JSON to use a universal dumper routine, but the default implementation of that routine strips out class information. If you need to load objects that contain lists and hashes with yet other objects, this can get rather messy unless you have written a loss-less transformation rule that converts all blessed objects (however deeply nested) into unblessed objects.

    At the end of the day, all that a general purpose tool is really giving you is a framework for managing and coordinating your transformation routines and serialization policies. If done right, it may also be able to take some of the grunt work out of developing the transformation routines, especially for deeply nested structures. (That was the goal of the module I wrote).

    A second issue concerns testing of such general purpose conversion engines. They don't require much code, but they are quite complex to test. There are some important "gotchas" that need to be considered when evaluating implementations:

    • implementations can end up in infinite loops unless they handle circular references properly
    • implementations can destroy the integrity of graphs unless the transformation routine preserves the fact that objects X1 and X2 both reference object Y somewhere deep in their data guts.
    • there are many, many paths through the code necessitating not 20 or 30 but hundreds or even thousands of tests

    Best, beth

Re: polymorphic data-driven rendering?
by zentara (Archbishop) on Mar 28, 2009 at 17:11 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://753856]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-03-29 11:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found