in reply to Datastructures to XML

To my way of thinking, “references” are not a concept that an external data-structure really knows about.

To see a shining example of what does work, we need look no further than the most-common external data structure of all:   an SQL table database.

A database, of course, consists of a collection of tables. But the tables do not contain “pointers to” one another... as their hierarchial pointer-based ancestors did. The overall data relationship is described using a set of distinct tables, and where two the information in one table is semantically related to the information in another, the relationship is expressed in a commonality of key-values.

You can quite-easily do the same thing here. Store your information, not as one XML-tree, but several. Attach to each node a “primary key” of some kind... it could be a Data::GUID or some derivative thereof. Then, when one tree needs to refer to another, it does so by mentioning its “foreign key.”

The software that works with these trees could, if necessary, use a Perl hash to refer to them ... or, if the amount of data is known to be manageable, it could just use XSL queries. You can construct references in your Perl nodes, as long as you take care to make sure that they are all appropriately weak or strong. In either case, the entire “references problem” simply goes away with respect to your external data representation.

If your data representation is XML, then it is highly-desirable to take the time to make your XML well-formed. You gain a lot of benefits by describing a formal schema and sticking to it. The biggest of these benefits is that both the incoming and the outgoing data is “known good.” If the incoming data validates against its schema, you can rely upon that validation. If you are building a data structure that's supposed to conform to a schema, exceptions can be thrown if they don't ... ergo, if exceptions do not occur, the code that builds the data-structure must indeed be building a conformant structure. (Anytime you are looking for bugs in complex code such as this, it helps immensely to know where the bugs are (probably...) not.)

Replies are listed 'Best First'.
Re^2: Datastructures to XML
by Jenda (Abbot) on Mar 18, 2009 at 16:45 UTC

    Re the database example: I would consider this an implementation detail. It doesn't really make a difference whether you know the address of another object (in the general sense) in memory or its ID within some collection. Or (in case of N-N relations) a list of pairs of IDs of objects.

    If I could store the data in any format I wished, it would be much easier, but sometimes I do not control the format. And sometimes even if I do it doesn't map to the data structure that best suits the needs of the task at hand directly.

    I think I should have explained better what I am really after. I'd like to have a "reverse to XML::Rules". That is with XML::Rules I can tweak the tree structure of the data from a XML so that I can work more easily with the resulting structure. Where the original structure may be designed with a different task in mind or just be more general. Then I would like to have some reasonably simple way to "convert" the datastructure back to the original format. Or for that matter to a different format, but quite often one that was not designed for this particular task.

      It may be an implementation detail, but it is a very important one that can have a significant impact on a general purpose "data structure to XML" converter.

      Sometimes data structures constructed in memory use pointers in place of ids. When this is converted to a persistent form (XML or otherwise), one must create some sort of id that corresponds to the pointer or reference. One will also need to decide on a name for the tag or attribute that holds the generated id since there is no corresponding array or hash element to "foreach". Otherwise there will be information loss.

      In some cases one can just assign sequential ids. Other formats might require GUID generation. Others might want a registered URI. Still other XML formats require that the id match something in a database or flat file. To get the right id one might need to do a look up on a "soft" id - for example a person's first and last name or their social security or passport number. Or one might need to add a new record to the database and capture the id assigned by the database.

      A second issue that I think sundialsvc4 was getting at was placement of XML elements. Both your template spec (and my functional alternative) assume a part-container model: elements nested within elements.

      But sundialsvc4 is reminding us of an extremely important and common alternative: the relational model. In the relational model, big ugly objects aren't nested. They are replaced by foreign key fields. The XML for the big-ugly-object is defined elsewhere, perhaps even in a different file. The two may be connected either by matching field values (a la a relational DBMS) or by "references" - the value assigned to the id attribute of the big-ugly-object-in-another-file.

      Because part-container models are easier to conceptualize, XML schemas often start life using a part-container model and then migrate over time to one that supports more of a relational model (less duplication of big-ugly-objects). For a readily available open source example, study the history of the XML format used with the ant build tool. Incidentally, the history of DBMS implementation also follows this progression (anybody remember CISC-ISAM databases?)

      Any general purpose tool would be wise to support both (or clearly explain its limits in the CAVEATS section of its POD). Otherwise a company using Mondo::Wonderous::Data::XML might find that they have to throw out, rather than modify, their XML generation code as their XML schemas mature.

      Best, beth

      It is not, strictly speaking, “an implementation detail.” When you are dealing with external data collections (be they SQL tables, or XML files or whatever), the notion of “addresses” (hence: references) does not exist. The notion of “keys,” of whatever format you wish, does.

      If you have ever had the unfortunate experience of dealing with an IMAGE or an IDMS database in any past-life you'd much rather forget, then you will know exactly what I am talking about...   :-D

      (You do not, of course, have to answer that. Many I.S. memories are much better left buried in the past.)