Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^3: polymorphic data-driven rendering?

by BrowserUk (Patriarch)
on Mar 30, 2009 at 14:25 UTC ( [id://754157]=note: print w/replies, xml ) Need Help??


in reply to Re^2: polymorphic data-driven rendering?
in thread polymorphic data-driven rendering?

Have you looked inside Data::Table?

Have you considered the complexity of table formatting options?

As a starting point, consider only those options (currently available and not) for html output.

Now consider trying to provide transparent mapping of all those options so that they can be applied to Excel output. And now try adding all the formatting possibilities that can be applied to Excel cells, groups, tables and graphs, and retro fitting them back into the existing API such that they can transparently be applied to the HTML output. And also to CSV/TSV/Formatted ASCII.

And for bonus point, now try and do the same thing for all the formatting possibilities that can be applied to PDFs, and retro-fitting those options into the modules interface such that they can be applied (even if they do nothing!), to the HTML/Excel/CSV/TSV/Formatted ASCII.

And if you've got through those mental exercises, now consider adding support for RTF; and OpenOffice XML Document Format; and Doxygen; and Wiki; and ...

And then imagine trying to perform comprehensive testing of all of those.

And finally, imagine trying to write good user documentation for the behemoth that would result.

And all that complexity arises for what is ostensibly a very simple and easily defined input format.

Now imagine trying to do the same thing for generic data structures: Hashes and arrays seems trivial, until you start nesting them. How about handling circular references? What about dealing with Moose objects with all their possibilities for annotations. And derivation from any of: simple blessed hash-based classes; or blessed array-based classes; or blessed scalar-based classes; or any of a dozen different implementations of Inside-out object based classes?

The problem with what the OP is asking for--"given a data structure, provide a fully object-oriented method ... that renders the data structure to any number of common formats such as HTML, Markdown, Excel, etc."--is that it creates a bottleneck. Or perhaps 'tourniquet' is a better word here.

Many input formats: given a data structure; transmutating to many output formats: HTML, Markdown, Excel, etc. through a single API. There are 3 ways to go when you try to do things like that:

  1. Lowest common denominator.

    Think POD.

    Supports rendition to every output format known to man.

    But only because it supports almost none of the facilities of any of them.

  2. Fully comprehensive, extensible, virtualised API.

    Think XML.

    Sets out to be able to encode anything, originating in any format, into a format that any complaint reader can read. And it achieves it.

    But only because it moves all the meta-data into another document (and format) entirely: namely the DTD. And that format is so specific and so rigid, that no two are ever compatible. They basically dodged the problem but pushing it upstream, with the result that they achieved nothing.

  3. Minimum working subset with generalised escapes.

    The best common example of this are filesystem APIs and the IOCTL calls.

    The common subset work--mostly; don't look too closely at the filesystems on CDs for example--most places, but at the expense of restricting innovation to an 'also ran' status.

    Ask yourself, why do most *nix systems still default to using Ext2 when there are much better one like Reifer available?

    Or why nobody makes use of the Streams facilities of NTFS?

    Once you define a minimal common subset of facilities, innovation is stifled to the point that it is extremely rare that that subset is ever extended. And that results in stagnation; or hookey proprietary extension mechanisms that never converge.

    Another good example of this is SQL. The movement in the standardised adoption of modern extensions to SQL is glacial.

    Such standards bring compatibility and interoperability; but they also bring uniformity and stagnation.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re^3: polymorphic data-driven rendering?

Replies are listed 'Best First'.
Re^4: polymorphic data-driven rendering?
by ELISHEVA (Prior) on Mar 30, 2009 at 15:46 UTC

    I guess you and I are reading the OP differently. You seem to think that the OP wants some kind of magical anything to anything converter. I think the OP just wants something better than what he or she already has.

    I'm rather puzzled that you think so negatively of POD, XML, and SQL. Are you really saying the world would be better off without them? That they are pointless because they don't eliminate the need to think about markup, serialization, or data relationships and integrity constraints? Marketing hype aside, I rather thought the purpose of all of these was to set us free (or at least make us more free) to focus on essentials - the things that make a difference, the little bits that really, really, need custom treatment.

    Or are you being rhetorical? You do mention at the end that they brought compatibility and interoperability. Even if you are being rhetorical, I think we need to balance the picture a bit.

    Ok, POD only provides a framework, but why is that a failure? And how is that a straight-jacket? With POD at least you have options to use many different markups and you have a framework that can be used to add new ones as you see fit.

    XML has its limitations, but it opened up a generation of machine independent serialization - YAML, JSON, SOAP, and many others to come. We may still have to write custom interpreters for each and every XML schema, but at least we don't need to write them for each and every schema and machine architecture. And we only need to write the actual interpretation of the schema - we don't have to hand parse the file itself. Do you really want to go back to the days where everyone believed that "real data" gets transferred in hand-crafted binary formats? Where comparing your dumped data to expectations almost always involved squinting at control characters?

    As for SQL - the standardization process is a mess. But SQL, dialects and all, gave us a common core language for expressing data relationships. I can still remember projects back in the 1980's where we had to hand craft queries in C to navigate linked lists because the DMBS didn't support SQL. It wasn't fun and it meant that anything you did was not 20% or 30% locked into a particular DBMS but 98% locked in. Would the world really be a better place without the (mostly) common DBMS interface we have today?

    Best, beth

    Update: acknowledged that BrowserUk may be taking a rhetorical position.

      I guess you and I are reading the OP differently. You seem to think that the OP wants some kind of magical anything to anything converter.

      Actually, on the basis of both the wording of the OP, and of my knowledge of previous questions he has asked here, that is exactly what I think the OP was asking for.

      I think the OP just wants something better than what he or she already has.

      Were that the case, I would have anticipated rather more discussion in the OP by way of outlining the specific limitations of what he currently has; and/or the specific situations of his application, that what he currently has, fails to address.

      I'm rather puzzled that you think so negatively of POD, XML, and SQL. Are you really saying the world would be better off without them?

      Without them? Probably not. Without their pervasive dominance and exclusion of other technologies? Absolutely.

      That they are pointless because they don't eliminate the need to think about markup, serialization, or data relationships and integrity constraints?

      Talking of straw men. I never said any of that.

      Marketing hype aside, I rather thought the purpose of all of these was to set us free (or at least make us more free) to focus on essentials - the things that make a difference, the little bits that really, really, need custom treatment.

      Do you believe that any one of them have achieved that?

      Ok, POD only provides a framework, but why is that a failure? And how is that a straight-jacket? With POD at least you have options to use many different markups and you have a framework that can be used to add new ones as you see fit.

      If POD is so flexible, how come PerlGuts Illustrated still isn't an integral part of the Perl documentation suite?

      XML has its limitations, but it opened up a generation of machine independent serialization - YAML, JSON, SOAP, and many others to come.

      If XML is so good, why have all those others evolved since?

      We may still have to write custom interpreters for each and every XML schema, but at least we don't need to write them for each and every schema and machine architecture. And we only need to write the actual interpretation of the schema - we don't have to hand parse the file itself.

      Firstly, you seem to imply that you think that writing a parser for the XML representation of the data is somehow easier than writing the parser for the actual data. I refute that implication. Look around. See how many XML parsers there are. And how many XML toolkits there are. And how many mechanisms have evolved to try and tame the XML behemoth. Theory says that you should only need one parser on each platform. And one toolkit per platform. But instead we have a whole sub-industry that has evolved to try an tame this beast--and so far, none of them have succeeded. And I'll stick my head on the block and say that no one of them ever will.

      The problem is that whenever you try to divorce the semantic and syntactic elements of information, you reduce it to data. And once you have data out of its context, it is almost meaningless. Little more than noise.

      XML has created far more problems than it has fixed.

      Do you really want to go back to the days where everyone believed that "real data" gets transferred in hand-crafted binary formats? Where comparing your dumped data to expectations almost always involved squinting at control characters?

      You speak as if binary formats had disappeared with the advent of XML. Think again. Look around you at all the most pervasive and successful technologies in use to day!

      GIF, JPEG, PNG, TIFF, MP3, MP4, ELF, OMF, ZIP, GZIP... I'll stop there, but there are hundreds, if not thousands, of binary formats in use--and more being invented every day--silently working away cross platform, on everything from the biggest iron to the smallest embedded device. They just do what they need to do; no more and no less. And none of them have been supplanted by an XML equivalent.

      Why, because it is far easier to port or write a dedicated binary reader for any of them than to try and write an XML parser and DTD that successfully reconstructs the semantics and context of the source information.

      As for SQL - the standardization process is a mess. But SQL, dialects and all, gave us a common core language for expressing data relationships.

      Unfortunately, that is all it allows. And then, only those relationships that can be represented by viewing the data as a flat table. And note the transition from information to data as you go through the normalisation process required!

      Look around you at the world--both the wider man-made world; and that of Nature; as well as the world of IT--almost everything we deal with is hierarchal in structure; or networked. (The mathematicians would say 'graph' here, but a) I'm not a mathematician; b) they have generalised the term and the operations applicable to the point where both have become almost impossible to translate back into real world use).

      Yes, there are techniques for representing hierarchical and network data in relational representations, but they are so complicated, opaque and slow, that they render such data representations almost impossible to use.

      Rather than simplifying applications, and allowing the application programmer to concentrate on the application, they force each application programmer to fight with converting between the 'natural' (usable & required) representation of his data, and the unnatural use of it's normalised relational form.

      For many, maybe even most, applications, hierarchical filesystems provide far less impedance to the persistence of application data than relational tables.

      I can still remember projects back in the 1980's where we had to hand craft queries in C to navigate linked lists because the DMBS didn't support SQL. It wasn't fun and it meant that anything you did was not 20% or 30% locked into a particular DBMS but 98% locked in.

      Since your memory goes back that far, you may also remember that around the time that Codd's opus began to gain mindset, there were many types of DB other than relational? If you ever had the pleasure of using a hierarchical database (IMS for example) for inherently tree structured data--eg. threaded discussions; MIS; stock control; parts lists; software development; many, many more--then you might realise how sorely they are missed since the advent of SQL.

      The main problem with big international standards like SQL--even more so than their glacial rates of evolution--is that they become one-size-fits-all must-haves. To the exclusion of all else.

      Would the world really be a better place without the (mostly) common DBMS interface we have today?

      Maybe. We'll never know for sure now. The damage is done.

      But just maybe if SQL hadn't gained the mindset it did at the time it did, some other, more flexible, more useful data language would have evolved (or rather, not been stifled), to a point of sufficient prominence that we'd be looking back on SQL with relieved nostalgia?

      Finally, coming back to your update and this:

      Or are you being rhetorical? You do mention at the end that they brought compatibility and interoperability. Even if you are being rhetorical, I think we need to balance the picture a bit.

      Was I being rhetorical. I guess I was in as much as I was asking the OP to consider whether what he was asking for was actually going to solve the problem he was trying to solve. The first step on that route would be for him to actually define that problem. In terms specific to his application requirements, rather than "I shouldn't have to write my own code to do this".

      And later, I was asking you to consider whether anything would actually be achieved by the creation of such a module. Which on the basis of this reply, you have chosen not to do. And so we come full circle.

      I was asking you to consider whether if you start with two (or more, but let's stick with two), complicated APIs that have some minimal level of overlap, is it possible to simplify the code that needs to call those APIs, by wrapping them over in a third API?

      And I attempted to explain why I think it is not. At least not without going one of the three routes I outlined. And why I think that any of those three routes is bad.

      1. If you go route 1, the resultant application code is barely simpler than if you had use one of the original APIs direct.

        But along the way, it discards most of the utility of (either) wrapped API, in the wrapping.

      2. Route 2 achieves nothing. It simply moves the problem from the data format, to the data format description.

        And it is far harder to write the code to re-create the information from a generic data format and an application specific data description, than it is to read and write application specific data file formats.

        Compare the simplicity (and performance) of pack/unpack with (any of) the XML apis.

      3. Route 3. Unless the overlap between the wrapped APIs is substantial, then most of the work will have to be done through escapes to the underlying APIs anyway.

        And once you start splitting the work between a few generic APIs, and many more escapes to the underlying APIs, the application code gets more complicated rather than less so.

        Far better to make a (load-time or runtime) decision as to which API is required for this run, and then load a specific module for that, than to try and wrap multiple, complicated APIs in a generic one.

      If your purpose is to simplify the application code, then I seriously council you to consider whether initiatives like this will ever be able to achieve that goal.

      I see you've already encountered a couple of the concerns I tried to highlight in my first reply to you above.

      • That of the combinational explosion of testing complexity:

        5 different serialization modules: Storable, Data::Dumper, YAML, JSON, and Data::Serializer

        14 different data samples including strings, numbers, arrays, hashes, blessed objects backed by both arrays and hashes, circular references, multiple references to the same array, hash, or objects, and deeply nested data structures containing a mix of hashes, arrays, objects, and pure scalars at least one example of each supported rule type

        Combinations of module, data sample, and rule been tested to verify that (a) the dump string matches an expected value, (b) loading the expected dump string generates the original internal representation, and (c) the process of dumping does not modify the original data.

      • And documentation:
        I feel especially concerned about the documentation. Without good documentation, this module is nearly useless.

      Of course, I hod no way of knowing that you had already expended what I assume must have been considerable effort. But still I council you to step back and consider whether the code the application programmer will end up writing to use your module, will be so much simpler than the code he would have to write if he used the underlying modules directly?

      As far as I am aware, most of those modules serialise and deserialise everything they support serialisation of--and one assumes you have no intention of extending what they support--via a single call. So, ostensibly, coding an application to use any one of the five becomes something like:

      sub serialise { my( $self, $format ) = @_; if( $format eq 'Storable' ) { require Storable; return freeze $self; } elsif( $format eq 'Data::Dumper' ) { require Data::Dumper; return dump $self; } elsif( ... ) { } ... else { die "Unknown serialisation format: $format"; } } ## Equivalent code for deserialisation.

      Of course, that is a simplification.

      But, I guess whether your module will

      • find sufficient applications that require the ability to serialise to multiple formats;
      • and, for those few applications that do need it, will reduce application programmer effort required;
      • and, whether you will get enough take up and sustain sufficient interest to develop and maintain the module such that it becomes the preferred interface to those underlying modules;

      are questions that we will have to review at some point in the future. Say, 3 years from now?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://754157]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (2)
As of 2024-04-26 04:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found