in reply to Validating an XML file with multiple schemas

If XML::LibXML::Schema is not the suitable module for this purpose, what's the success rate with the alternatives ?

XML::LibXML is an interface to libxml2, which is a very powerful library. I'd be surprised if it couldn't handle it, so I'd try going with it first. Sometimes it takes a bit of tweaking, for example here I showed how to write custom code to cache external resources during DTD validation. If you have trouble with it, feel free to report back here with an SSCCE that reproduces the issue.

  • Comment on Re: Validating an XML file with multiple schemas

Replies are listed 'Best First'.
Re^2: Validating an XML file with multiple schemas
by mart0000 (Initiate) on Jan 04, 2019 at 22:31 UTC

    Thank you for your quick response ! Please beware that my Perl skills are in need of severe repair, so my responses will reflect that. From looking at your example, I'm trying to figure out how to appropriate the idea of caching a list of schema file content, similar to caching the DTD (or DTDs) from external resources referenced by the URIs within an HTML,XHTML,XML,... After I read the Schema.pod packaged with XML-LibXML-2.0132, it sounded like libxml2's support for handling W3C Schema may not be as mature as the DTDs. I hope I'm very much mistaken.

    Let me give you a brief example of the single schema usage (minus the error handling and debug) :

    package example; use XML::LibXML; use strict; use warnings; my $xmlFilePath = <local file path>; my $xsdFilePath = <local file path>; my $document = XML::LibXML->load_xml( location => $xmlFilePath ); my $schema = XML::LibXML::Schema->new( location => $xsdFilePath ); $schema->validate( $document );

    Simple & short. To extend the above for multiple schemas, and keep the same feel, I'll want to build something that allows the following usage:

    : : my $schema = XML::LibXML::Schema->new( location => $xsdFile1Path ); $schema->add( location => $xsdFile2Path ); $schema->add( location => $xsdFile3Path ); : :

    Or something similar and grammatically accurate. As long as the library internally has the mechanism to support the dependencies between the schemas themselves, it shouldn't be too complicated to extend XML::LibXML::Schema and take advantage. However, if it were so, I'd imagine the author would have already made an attempt.

    Do you still think I'll be successful in reusing your idea to achieve the above ?

      It's unclear to me whether by "multiple schemas" you mean validating one XML file against multiple different schemas, or whether it's one Schema file that includes other Schema files. Could you show a short, complete example, with simple XSD files that represent what you're trying to do? Please see Short, Self-Contained, Correct Example.

      The following works for me.

        Here's an exaggerated example where a Personal Information schema (captured as personal.xsd) uses a flexible Contact schema. A contact can be an Address, Email, a specific online id, a phone number etc., I've provided a sample address.xsd and email.xsd. The Contact section of the Personal Information schema allows such content extension using the broader, "any" element, but still keeps the validations strict on purpose.

        To test, create a temporary folder for the 5 files (2 .xml & 3 .xsd) I've provided below. I used C:\temp1 in my example, but alter the attached perl code to point to your path.

        personal.xsd

        <?xml version="1.0" encoding="UTF-8"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:per="urn:tempuri:Personal" targetNamespace="urn:tempuri:Personal" elementFormDefault="unqualified"> <element name="PersonalInfo"> <complexType> <sequence> <element name="FirstName" type="string"/> <element name="LastName" type="string"/> <element name="Contact" type="per:ContactType"/> </sequence> </complexType> </element> <complexType name="ContactType"> <sequence> <any namespace="##other" processContents="strict" maxOccurs="unbounded"/> </sequence> </complexType> </schema>

        address.xsd

        <?xml version="1.0" encoding="utf-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:Contact="urn:tempuri:Contact" targetNamespace="urn:tempuri:Contact" elementFormDefault="unqualified"> <xs:element name="Address"> <xs:complexType> <xs:sequence> <xs:element name="Street" type="xs:string"/> <xs:element name="City" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

        email.xsd

        <?xml version="1.0" encoding="utf-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:Contact="urn:tempuri:Contact" targetNamespace="urn:tempuri:Contact" elementFormDefault="unqualified"> <xs:element name="Email"> <xs:complexType> <xs:sequence> <xs:element name="EmailAddress" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

        example1.xml

        <?xml version="1.0" encoding="UTF-8"?> <pinfo:PersonalInfo xmlns:pinfo="urn:tempuri:Personal" xmlns:cinfo="urn:tempuri:Contact" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:tempuri:Personal personal.xsd"> <FirstName>First Name</FirstName> <LastName>Last Name</LastName> <Contact> <cinfo:Address xsi:schemaLocation="urn:tempuri:Contact address.xsd"> <Street>Main Street</Street> <City>Main City</City> </cinfo:Address> </Contact> </pinfo:PersonalInfo>

        example2.xml

        <?xml version="1.0" encoding="UTF-8"?> <pinfo:PersonalInfo xmlns:pinfo="urn:tempuri:Personal" xmlns:cinfo="urn:tempuri:Contact" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:tempuri:Personal personal.xsd"> <FirstName>First Name</FirstName> <LastName>Last Name</LastName> <Contact> <cinfo:Email xsi:schemaLocation="urn:tempuri:Contact email.xsd"> <EmailAddress>email1@test.org</EmailAddress> </cinfo:Email> </Contact> </pinfo:PersonalInfo>

        And finally the Perl code:

        testExample.pl

        #!/usr/bin/perl package example; use XML::LibXML; use strict; use warnings; testExample1(); testExample2(); sub testExample1 { my $schema = XML::LibXML::Schema->new( location => "C:/temp1/personal.xsd" ); my $document = XML::LibXML->load_xml( location => "C:/temp1/example1.xml" ); $schema->validate( $document ); } sub testExample2 { my $schema = XML::LibXML::Schema->new( location => "C:/temp1/personal.xsd" ); my $document = XML::LibXML->load_xml( location => "C:/temp1/example2.xml" ); $schema->validate( $document ); }

        Hopefully, you'll see an error similar to the following for the 1st example:

        C:/temp1/example1.xml:0: Schemas validity error :
            Element '{urn:tempuri:Contact}Address':
            No matching global element declaration available, but demanded
            by the strict wildcard.

        It's possible I'm missing an appropriate way to reference the contact namespace for the address and email schemas within the xml. There are other ways to successfully achieve validation, such as altering the personal.xsd file to statically import the other 2 schemas. Unfortunately, that won't be an option, unless I've mistyped/overlooked a schema definition nuance while creating the example.

        Running the test outside Perl works correctly with strict validation turned on. I did have to add (with ease) those schemas programmatically though. If there's a similar way to import the Contact namespace of either schemas in Perl, right before the XML validation, it should solve the problem too.

        Really appreciate you taking the effort to try things out and provide examples. It means a lot ! I will attempt to provide a more concrete example.