Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

XML and DTD and Twig...

by lee_crites (Scribe)
on Sep 30, 2004 at 00:39 UTC ( [id://395194]=perlquestion: print w/replies, xml ) Need Help??

lee_crites has asked for the wisdom of the Perl Monks concerning the following question:

A philosophical question

Y'all;

I am building a database of sorts using XML as the layout. Each record is a separate XML file. There is a single perl script (being called as cgi) that will access the record(s) in the database. On my linux system I am making links to this one script, and so the name is the "key" to the record(s) selected.

The "real" script (myxmldb.cgi) will show ALL records. But all of the "other" scripts will only show "their" records. So one of them (goober.cgi -> myxmldb.cgi) will show the ones with "goober" in the predefined "process-created" tag. Adding more stuff to the mixture, there is a "goober.xml" file that is an ini-file in xml format. Each database, then, can have it's own unique layout, all being accessed by the same script.

My plan was that when I wrote out the database record, I was going to update the built-in DTD information to match it's layout. But the XML::Twig process doesn't handle the DTD well enough. So I created a few more predefined tags to hold the list of fields, etc. All of this was so if I called the "main" script (myxmldb.cgi) that it could (easily) show ALL of the records, irrespective of the layout.

So with that (obtuse) description of where I am and what I'm trying to do, let me ask this:

Is it better to keep the DTD type information internal to the XML file, or as a separate file? I'm interested in how hard it is to keep them up-to-date if they are separate as opposed to how easy is it when they're together. And then how that works with the fact that XML::Twig doesn't handle the DTD.

In other words, how do y'all handle this?

I am working on my first XML project. The reference project I'm using as a base had the DTD information included in the XML files, so that's how I started off. I've got 27 years of programming experience, so I'm not afraid to tackle something bad|ugly to take care of things. I'm just interested in the feelings from this forum in this area. I have the luxury ot going either way at this point.

Thanks to the perlmonks group. I stumbled on this site via some of the on-line docu for XML::Twig -- which, btw, I am finding quite usable! (thanks!!) From what I've seen so far, this is a great site!

Lee

Replies are listed 'Best First'.
Re: XML and DTD and Twig...
by Velaki (Chaplain) on Sep 30, 2004 at 14:32 UTC

    Call it a personal choice, but prefer to keep the DTD separate from the *ML files. This goes back to my SGML days, when I would write a DTD and distribute it to people who would write compliant SGML.

    HTML, sadly and frequently, breaks from adherence to the DTD, but XML at least makes an attempt to return to it.

    Since one DTD may engender many XML document, I prefer to keep it separate, but as I said in the beginning, it's a personal choice.

    Thoughts,
    -v
    "Perl. There is no substitute."
Re: XML and DTD and Twig...
by mirod (Canon) on Sep 30, 2004 at 11:58 UTC

    What is missing from XML::Twig to make it easier for you to write your code? If you need fixed/default attributes to be filled, I think I could add it.

      Sorry for the delay in responding -- I'm the only IT person in my company, and a few days after posting this question I got snowballed -- this is the first chance I've had to get back to my programming.

      I'm not sure I can say something is "missing" from Twig. It is currently working just fine -- in my initial testing and such. The only "problem" I am having is a design issue.

      The way version-2 of my system works, each data record is in a unique file; each file was plain ascii; each file had a data layout like this:

      fieldname|whatever data is in the field\n

      Everything worked fine until they started doing the cut/paste thing and dumping in full email text and/or web pages and/or resumes, etc. So I started tinkering about with encoding and such, with limited success.

      Then I stumble on this whole XML concept, and realize that this is just what I've been needing to overcome some of the issues I was working with -- so now I'm moving my data to XML (and now it is v3.1), and the layout looks like:

      <fieldname>whatever data is in the field</fieldname>

      I had a "VOC" file (a carryover from my old PickBASIC days) that had the layout of each "fieldname" -- and converting that to XML (esp using XML::Twig) was a beautiful thing. It now works like a champ, and I love it! There is one XML file (voc.xml) that is easy to read and modify and maintain; reading it into the program is almost too trivial; extracting the layout info is a piece of cake!

      Now, I could do as some suggested, and that is have one file per database, and have the multiple records in it, and that'd be okay. I can see how that might be good (and how Twig could be exploited to make that work). But my one-record-per-file concept makes things work well for the web-based projects I am working on (limited record locking issues), so I don't want to walk away from it just yet.

      The point that I was looking to get around was that I wanted to make a generic enough data entry and data display routine that would grab any of the records (no matter what the defined layout was) and edit it or display it. To do that, my thoughts were that I had to either: 1) read in and process the DTD, or 2) have Twig automagically read in and process the DTD. But XML::Twig doesn't deal with the DTD, so that was my problem.

      So that was the basis of my comments and question. If XML::Twig processed the DTD and could return to me the "complete list" of fields to be expected, then nobody would have heard from me. I would have stuck it into each item, and that would have been that. As it is, I am now getting back to trying to figure out how to grab that info from either the standalong DTD file or the internal DTD info in the XML file.

      And since I spent the time since I posted this original message putting out fires instead of working on the project, I am just today starting to look back at that question again.

      I guess the bottom line question is: how do folks keep up with the full list of what fields are supposed to be in the XML file?

      Lee Crites
      lee@critesclan.com

        You can wrestle some information out of XML::Twig: you need to use the load_DTD option to read the external DTD, then get the list of elements in the dtd and their models using the model method. It is not perfect, and you then have to do some (simple) post-processing to get just the sub element names.

        For example:

        #!/usr/bin/perl -w use strict; use XML::Twig; my $t= XML::Twig->new(load_DTD => 1, error_context => 1)->parsefile( ' +test.xml'); my @elts= $t->model; print "all elements: ", join( ', ', @elts), "\n"; foreach my $elt (@elts) { my $model= $t->model( $elt); print "model of $elt: $model\n"; my @subelts= grep { $_ && $_ ne 'PCDATA'} split( /[^\w:-]/, $model +); # keep only element names # if you want an array of unique element names you then need this my %subelts= map { $_ => 1 } @subelts; @subelts= sort keys %subelts; print "elements in $elt: ", join( ', ', @subelts), "\n"; }

        Does this help?

Re: XML and DTD and Twig...
by pg (Canon) on Sep 30, 2004 at 04:09 UTC
    "Each record is a separate XML file."

    Why? Does it not make more sense to say each table is a seperate XML file. In reality, each logical table could indeed map to multiple physical files.

Re: XML and DTD and Twig...
by graff (Chancellor) on Sep 30, 2004 at 12:57 UTC
    So, if there are groups like "goober" and "blurch" and "frang", each having a separate "alias" to the "real xmldb.cgi" script, and each having a distinct xml structure, is it the case that all "goober" records have the same structure, and all "blurch" records share some other structure, etc?

    If that's the case, then I think a single DTD file per group would be best. And when you use your main script to access ALL groups, I would expect that you just need to create a new Twig object for each group, and access the DTD for that group to know about its structure. (You shouldn't need to add structure to the XML data in order to describe the structure, though perhaps this might not be a bad idea -- assuming it's done consistently across all groups/records so that they all have something useful in common, regardless of anything else.)

    The only way it would make sense to have the DTD included with every record is if every record could potentially have a different structure, regardless of what group it's in. (For that matter, I wonder about needing DTD's at all -- I thought the one of the design goals of XML was to make the markup parseable without requiring any "ad hoc" DTD a priori -- in contrast to the more cumbersome SGML.)

      In my case, each file will be one record. That record could belong to any "database" I have. I have defined (for myself) certain fields (e.g. 'name' or 'city') that have predefined meaning and a common layout. I put that layout into a global INI type file that all scripts will access. When a new (and unique) field is added to any database, I stick that into the global file as well. So there is one file (voc.xml) that has every field name and all of it's particular layout definitions.

      What I am doing is to write a single display and edit CGI script. It will read in the voc.xml file, and then somehow (automagically) know what to do with the record. This worked just fine before the migration to XML, so I know the process works, I'm just trying to find the "best practice" for the XML world in duplicating it.

      It seemed to me that the perfect place to do this was in the DTD. Each database would have a DTD file that contained a list of all of the fields it used, etc. But how do I tie the record I am processing to the DTD that relates to it? The internal DTD info seemed like the perfect location for that as well. But XML::Twig didn't process it correctly, and when I looked, I found out that this was a known problem.

      But my philosphical question remains: what's the best practice for linking the data file to the DTD file? I can read the documentation, but so far I haven't really gotten a good feel for what is really being done by those folks who are actually using this.

      Lee Crites
      lee@critesclan.com
Re: XML and DTD and Twig...
by Maxim (Sexton) on Sep 30, 2004 at 10:45 UTC
    I suggest to use XML::SIMPLE. It is quite good and easy to use it. I am on this kind of project. See ya

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://395194]
Approved by kvale
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-04-20 02:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found