ron800 has asked for the wisdom of the Perl Monks concerning the following question:

Hello. I'm neither a Perl or XML expert. I was wondering if anyone could help me with a bit of code.Is it possible to create a function where you just provide the name of an XML file and it will return a true/false as to whether there is an associated DTD/schema for the XML file? I don't want to parse or validate the XML, just check for the DTD or schema existence. Can someone point me in the direction of where begin with this?

Replies are listed 'Best First'.
Re: Quick check for XML DTD or Schema
by boftx (Deacon) on Nov 06, 2014 at 00:16 UTC

    Well, my gut reaction would be to open the file and read in the first few lines and see if any of them match a regex. But that assumes that what you are looking for would always be near the start of the file.

    On a more general note, anytime I hear a question that begins with "Is it possible to do ..." my immediate response is always "Yes! Just give me enough time and money." If I can't do it myself, I can hire the people who can. Do you want faster than light travel? Do you want anti-gravity? Just give me enough time and money.

    Update: Here is a very crude approach that might work. (Please note that this is basic logic, NOT working code!)

    open file for reading; $found_dtd = 0; while not end of file { next unless string has pattern known to be after DTD or schema if line matches DTD or schema regex { $found_dtd = 1; last; } } return $found_dtd;

    You must always remember that the primary goal is to drain the swamp even when you are hip-deep in alligators.
      Makes sense. Assuming the declaration is at the top, is it sufficient in the case of a DTD, to check for "<!DOCTYPE root-element'? Or for a schema, to check that the path of the XML schema is in the schemaLocation attribute of the XML file root element, i.e. check for 'xsi:schemaLocation' or 'xsi:noschemalocation'?

        I would think so. Try it. :)

        Using one of the modules named below would work, too, but I got the sense from the OP that you don't want to do that, just a real quick peek at the file without the overhead of doing any work with it.

        You must always remember that the primary goal is to drain the swamp even when you are hip-deep in alligators.
Re: Quick check for XML DTD or Schema.. XML::Twig solution
by Discipulus (Canon) on Nov 06, 2014 at 16:12 UTC
    Hi and welcome ron800, i'm not an expert of XML but when i need to get rid of such a shaggy thing i definetively use XML::Twig. It can do many many many things for you.

    For example XML::Twig has an option (never seen before your question..) LoadDTD and is not to kill bugs on the twig.. ;=)

    Is used to load DTD (internal OR external.. be aware!).
    you can use also a DTDHandler to handle the DTD.
    See the docs about XML::Twig methods (DTDHandler ) and also the dedicate section.

    After reading the section you (me too..) discover that DTDHanlder receive two params: the twig itself and the DTD. So you can check for $_[1] ie the second element of the array passed to a subroutine.
    cat with_dtd.xml <?xml version="1.0"?> <!DOCTYPE p [ <!ELEMENT p ANY> ]> <p>Hello world!</p> perl -MXML::Twig -e "$twig= new XML::Twig(LoadDTD=>1,DTDHandler=>sub{p +rint qq(FOUND\n) if $_[1]});$twig->parsefile($ARGV[0])" with_dtd.xml #output.. FOUND cat withOUT_dtd.xml <?xml version="1.0"?> <p>Hello world!</p> perl -MXML::Twig -e "my $twig= new XML::Twig(LoadDTD=>1,DTDHandler=>su +b{print qq(FOUND\n) if $_[1]});$twig->parsefile($ARGV[0])" withOUT_dt +d.xml #output..

    HtH
    L*
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      Thanks. I think I now can build the solution using XML::Twig
      my $t= XML::Twig->new(load_DTD => 1,DTDHandler=>\&checkit )->parsefile +( "$xmlfile"); sub checkit { my ($twig, $dtd) = @_; my $root=$twig->root; my @text=$root->att_names; if (grep /schemaLocation/, @text) { print "found schema\n"; } else { if ($twig->{twig_doctype}) { print "found dtd\n"; } else { print "no dtd/schema defined\n" } } }
        Bravo!
        L*
        There are no rules, there are no thumbs..
        Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.