kilinrax has asked for the wisdom of the Perl Monks concerning the following question:

One of my company's 'targetted ads' systems requires clients to give us information on their products in PML, a proprietary XML based format. Unfortunately, transfer of said information is handled by Sales people on both ends, without (seemingly), anyone technical being involved.
As a result, I am occasionally plagued with requests to sort people lives who could tell valid XML from their own arse with the aid of a map. Fortunately, the one guy with a clue in the whole of their department has managed to arm himself with an XML validator, but this still doesn't catch all the faulty PML files they recieve.
So, that got me wondering, does a Perl module exist that can take in a DTD and an XML file, and check not only if the file is XML compliant, but complies with the Doctype Declaration?
I have tried several searches on CPAN, Google, and WebTop, but thus far to no avail :-(

(p.s. apologies for any excess vitriol in this post - it's been a looong week)

Replies are listed 'Best First'.
RE: Doctype specific XML Validator
by KM (Priest) on Sep 08, 2000 at 23:30 UTC
    XML::Parser should do this for you. But, you may also be interested in this article discussing validation.

    Cheers,
    KM

Re: Doctype specific XML Validator
by Fastolfe (Vicar) on Sep 08, 2000 at 23:25 UTC
    It was my understanding that the stock XML::Parser modules were validating parsers, so, with the presence of a <!DOCTYPE> header (or the explicit parsing of a separate DTD in your code), the XML would be validated as well as parsed.
Re: Doctype specific XML Validator
by kilinrax (Deacon) on Sep 09, 2000 at 18:02 UTC
    merlyn, mirod- thanks a lot, XML::Checker is just what i was looking for ;-)
    And PotPieMan, that was exactly what I meant, sorry my post wasn't too coherent - but like i said, it was the end of a very long week :-/
Re: Doctype specific XML Validator
by mirod (Canon) on Sep 09, 2000 at 17:17 UTC

    Definitely use XML::Checker from Perl, or use an external parser as KM suggested.

    Once and for all: expat, the underlying parser for XML::Parser is _NOT_ a validating parser. It only check for well-formedness (whether the document is tagged properly, no DTD involved).

RE: Doctype specific XML Validator
by BastardOperator (Monk) on Sep 09, 2000 at 00:21 UTC
    Definitely XML::Parser, that's obvious from the other replies. More than just chiming in to get an XP ;), I was very curious about "proprietary XML". First off, the major strength of XML is in the fact that it's open, that and the fact that it's relatively simple. But what does a company do to XML to make it proprietary? I'm just curious. I can't imagine with XML being as flexible as it is, what a company would need/want to do to "make it their own".
      Given that the whole point of XML is to leave room for expansion, my impression is that an individual or company can create a "proprietary" DTD (a document type definition) that describes their own XML format. Each format can include different types of objects. For an example, go to Moreover.com, click on the developers link, and compare the different formats that are based on XML. For instance, there's Netscape's definition called RSS, and there's Moreover.com's own format. You may also want to take a look at the W3C Web site for some of the actual specs on XML and other related technologies like XPath and XSL (both of which are really cool).

      Because of the structure of XML itself, pretty much any format is easy to parse. It's just a matter of picking a way to parse it in Perl and then accessing the data. Most often, a person wishing to parse an XML document would use one of the styles in XML::Parser. For instance, one can use the stream style by setting handlers or the the objects style.

      So, it's not really a matter of the individual or company wanting to make XML "their own." RSS and other definitions have come about because of the desire to have a common format so that headlines can be passed between sites on the Web (e.g. Slashdot and Freshmeat). XML's intrinsic flexibility allows individuals and companies to roll their own.

      -ppm

        O.k. I'm with you, you're talking about the DTD's. I just wouldn't call that proprietary because..well..that's the point of DTD's. They're supposed to be customized. My impression from your post was that the XML specification (or components related to, such as DTD) had been changed to suite that company. Creating a DTD to specify what data is valid and what isn't is, of course, why XML is as good as it is.

        NOTE:
        For those who have no idea about XML, a DTD is a "Document Type Definition" which specifies how your data will be structured, what's valid, what's required, etc. XML::Parser and other _validating_ parsers will check an XML document against the specified DTD to ensure that it meets that specification, not all parsers do that (i.e. non-validating parsers). This is very well suited to many things, and is largely becoming an excellent way for different businesses to exchange information.

        NOTE2:
        We're all so darned geeky here that I'm sure nobody got anything out of that ;). Just lookin' out for the potential new guy.