Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: polymorphic data-driven rendering?

by BrowserUk (Patriarch)
on Mar 29, 2009 at 10:14 UTC ( [id://753969]=note: print w/replies, xml ) Need Help??


in reply to polymorphic data-driven rendering?

It'll probably comes as no surprise that I am very skeptical of the benefits of trying to over generalise in this way. I'll try to make my case for why.

By analogy. It could be postulated that the steering wheels on cars; the handlebars on a (motor)bikes; the control columns on commercial aircraft; the joysticks on military jets; the tillers on small boats; even the rains on horses and horse-drawn carriages; all serve the same purpose and could be replaced with a single control.

It is probably technically feasible to attach servos to the bit in a horses mouth and have the rider use a joystick to steer, but it is probably overkill.

Conversely, I'm not sure I'd want to fly in a 777 if the pilot used a pair of leather straps or big tiller to steer it.

Whilst that analogy is jokey, don't take it for a joke. Perhaps the single hardest element of modern software design to get right is the abstraction. And by far the biggest mistake in recent years is over abstraction. It is far too easy to get carried away with finding abstract similarities between things that have no business being conflated.

By example. A few years ago, I worked on the periphery of an MIS system for a large retail chain. This system was heavily OO layered over an RDBMS. Within it, everybody--personnel, suppliers, customers et al.--were instances of a Person class. Which mapped to one large primary table in the DB with lots of specialisations (FKs) hanging off of it. The problems came when trying to control access to it. Which came to a head when a minor programming error lead to sensitive information about customers being sent out to a supplier.

Putting all your data in one place may sound like a great idea from the data warehousing/analysis perspective, but security-aware organisations use compartmentalisation for very good reasons. It may lead to apparent redundancies, but it also leads to "redundant" layers of security, which you'll be very glad of when one of the layers is breached.

Do not let theoretical principles override pragmatism and practicality without serious consideration on a case by case basis.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: polymorphic data-driven rendering?
by ELISHEVA (Prior) on Mar 30, 2009 at 09:18 UTC
    I upvoted this node because, in the main, I think you make an excellent point about the positive value of redundancy. However, I think when you apply this argument to generalized data conversion, you are setting up a straw man.

    I've built and designed systems with intentional redundancy, usually for reasons of security and auditing, just as you discussed above. However, the redundancy is in the data, not the code that processes it. Perhaps we just read the OP differently, but I think OP was bothered by the redundancy in the template code, not the redundancy in the data it produced.

    Whether in templates or Perl code, there is a lot of coding redundancy when you navigate complex data structures to produce custom per-object dump routines for N different formats. Redundant code attracts bugs like honey does bees. Even if you fix a bug in one place, you still have to go out and find all the places where that bug is repeated. This search process is highly prone to human error.

    Looking at the specific example you gave (Person-everything) I would argue that lack of redundancy was not the problem, but rather a mismatch between the design and the code/architecture used to process the data. If you are going to fracture objects among multiple tables, you also need to insure that you have a proper mechanism for reassembling the objects and applying security based on view/Person "subclass". Furthermore the mechanism needs to be implemented via stored procedures, triggers and restricted views of the data. Robust transaction support is also essential.

    If your database does not support such a mechanism and you can't write an extension to the DBMS that does support it, then you are guarenteed to have security problems someday, somewhere. You have no choice but to implement the reassembly and security mechanism outside of the DBMS and your security is totally dependent on the good behavior of the applications that access the database.

    Best, beth

      Have you looked inside Data::Table?

      Have you considered the complexity of table formatting options?

      As a starting point, consider only those options (currently available and not) for html output.

      Now consider trying to provide transparent mapping of all those options so that they can be applied to Excel output. And now try adding all the formatting possibilities that can be applied to Excel cells, groups, tables and graphs, and retro fitting them back into the existing API such that they can transparently be applied to the HTML output. And also to CSV/TSV/Formatted ASCII.

      And for bonus point, now try and do the same thing for all the formatting possibilities that can be applied to PDFs, and retro-fitting those options into the modules interface such that they can be applied (even if they do nothing!), to the HTML/Excel/CSV/TSV/Formatted ASCII.

      And if you've got through those mental exercises, now consider adding support for RTF; and OpenOffice XML Document Format; and Doxygen; and Wiki; and ...

      And then imagine trying to perform comprehensive testing of all of those.

      And finally, imagine trying to write good user documentation for the behemoth that would result.

      And all that complexity arises for what is ostensibly a very simple and easily defined input format.

      Now imagine trying to do the same thing for generic data structures: Hashes and arrays seems trivial, until you start nesting them. How about handling circular references? What about dealing with Moose objects with all their possibilities for annotations. And derivation from any of: simple blessed hash-based classes; or blessed array-based classes; or blessed scalar-based classes; or any of a dozen different implementations of Inside-out object based classes?

      The problem with what the OP is asking for--"given a data structure, provide a fully object-oriented method ... that renders the data structure to any number of common formats such as HTML, Markdown, Excel, etc."--is that it creates a bottleneck. Or perhaps 'tourniquet' is a better word here.

      Many input formats: given a data structure; transmutating to many output formats: HTML, Markdown, Excel, etc. through a single API. There are 3 ways to go when you try to do things like that:

      1. Lowest common denominator.

        Think POD.

        Supports rendition to every output format known to man.

        But only because it supports almost none of the facilities of any of them.

      2. Fully comprehensive, extensible, virtualised API.

        Think XML.

        Sets out to be able to encode anything, originating in any format, into a format that any complaint reader can read. And it achieves it.

        But only because it moves all the meta-data into another document (and format) entirely: namely the DTD. And that format is so specific and so rigid, that no two are ever compatible. They basically dodged the problem but pushing it upstream, with the result that they achieved nothing.

      3. Minimum working subset with generalised escapes.

        The best common example of this are filesystem APIs and the IOCTL calls.

        The common subset work--mostly; don't look too closely at the filesystems on CDs for example--most places, but at the expense of restricting innovation to an 'also ran' status.

        Ask yourself, why do most *nix systems still default to using Ext2 when there are much better one like Reifer available?

        Or why nobody makes use of the Streams facilities of NTFS?

        Once you define a minimal common subset of facilities, innovation is stifled to the point that it is extremely rare that that subset is ever extended. And that results in stagnation; or hookey proprietary extension mechanisms that never converge.

        Another good example of this is SQL. The movement in the standardised adoption of modern extensions to SQL is glacial.

        Such standards bring compatibility and interoperability; but they also bring uniformity and stagnation.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        I guess you and I are reading the OP differently. You seem to think that the OP wants some kind of magical anything to anything converter. I think the OP just wants something better than what he or she already has.

        I'm rather puzzled that you think so negatively of POD, XML, and SQL. Are you really saying the world would be better off without them? That they are pointless because they don't eliminate the need to think about markup, serialization, or data relationships and integrity constraints? Marketing hype aside, I rather thought the purpose of all of these was to set us free (or at least make us more free) to focus on essentials - the things that make a difference, the little bits that really, really, need custom treatment.

        Or are you being rhetorical? You do mention at the end that they brought compatibility and interoperability. Even if you are being rhetorical, I think we need to balance the picture a bit.

        Ok, POD only provides a framework, but why is that a failure? And how is that a straight-jacket? With POD at least you have options to use many different markups and you have a framework that can be used to add new ones as you see fit.

        XML has its limitations, but it opened up a generation of machine independent serialization - YAML, JSON, SOAP, and many others to come. We may still have to write custom interpreters for each and every XML schema, but at least we don't need to write them for each and every schema and machine architecture. And we only need to write the actual interpretation of the schema - we don't have to hand parse the file itself. Do you really want to go back to the days where everyone believed that "real data" gets transferred in hand-crafted binary formats? Where comparing your dumped data to expectations almost always involved squinting at control characters?

        As for SQL - the standardization process is a mess. But SQL, dialects and all, gave us a common core language for expressing data relationships. I can still remember projects back in the 1980's where we had to hand craft queries in C to navigate linked lists because the DMBS didn't support SQL. It wasn't fun and it meant that anything you did was not 20% or 30% locked into a particular DBMS but 98% locked in. Would the world really be a better place without the (mostly) common DBMS interface we have today?

        Best, beth

        Update: acknowledged that BrowserUk may be taking a rhetorical position.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://753969]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2024-04-24 06:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found