Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I wish to make an MSc project proposal for my supervisor to use and I was wondering whether people here could offer advice. I am developing an XML structure that represents genomic features; attribute values include application commands and directory locations. For those of a genomics/bioinformatic background; I am developing an XML database that uses EMBOSS database features. If anyone can show examples of something similar I would be really interested to see it. I have not yet designed an XML DTD or XML Schema. XML Schema seems, at present, to only be available for use with Java modules. Does anyone know whether this is likely to soon be supported by Perl also? I have heard that XML Schema offers many advantages over DTDs. What I want to know is if people believe that it is appropriate to propose a 6 month MSc bioinformatics project that first ways up the pros and cons of DTDs over XML Schemas for applications in genomics and then, depending on the results of the initial evaluation, and skills of the student, sets out to develop Perl modules for the support of XML Schema. Am I being totally unrealistic here? Maybe I can propose two projects to run in parallel. Although at this stage I am not sure how I could split this.

Replies are listed 'Best First'.
Re: MSc XML Project
by Ionizor (Pilgrim) on Jan 04, 2003 at 00:32 UTC

    You only really need a DTD or Schema if you (or anyone you will be distributing your XML files to) need to validate against one. If there are only a small number of parties involved, everyone can agree on a common format (or agree to follow your common format), and you can be sure that every script or program will produce correct output there isn't really a need for a formal DTD or Schema.

    If some of those conditions aren't met, you will need a DTD or Schema. In order to decide which is appropriate, first decide "Will a DTD do the job?". If the answer to that question is no, only then should you use a Schema.

    My latest XML project was part of a messaging system for a distribution chain. My (server) script had to talk to their (client) script and that was all so we decided that developing a DTD was unnecessary.

      One comment I would make, is that any well designed software defines it's interfaces. XML is effectively an interface, so it's format really should be documented. To this end, XML Schema and DTDs are very effective and I would say very worthwhile.

      Validating source documents against your XSD/DTD at runtime could be considered overkill (and could just throw any performance you were hoping to have out of the window)

      My opinions are purely that, and are based on working on a very large multi component XML system that I've been involved with the design of for the last year or so.

      --
      RatArsed

XML Schemas and DTDs (Re: MSc XML Project)
by tomhukins (Curate) on Jan 04, 2003 at 16:21 UTC
    You might like to read darobin's recent comments about XML Schema on use.perl. Whilst thinking about XML Schema and DTDs, take a look at Relax NG, an interesting schema language developed outside the W3C that sits somehwere between the simplicity of DTDs and the power of schemas.