in reply to Re: Validating an XML file with multiple schemas
in thread Validating an XML file with multiple schemas

Thank you for your quick response ! Please beware that my Perl skills are in need of severe repair, so my responses will reflect that. From looking at your example, I'm trying to figure out how to appropriate the idea of caching a list of schema file content, similar to caching the DTD (or DTDs) from external resources referenced by the URIs within an HTML,XHTML,XML,... After I read the Schema.pod packaged with XML-LibXML-2.0132, it sounded like libxml2's support for handling W3C Schema may not be as mature as the DTDs. I hope I'm very much mistaken.

Let me give you a brief example of the single schema usage (minus the error handling and debug) :

package example; use XML::LibXML; use strict; use warnings; my $xmlFilePath = <local file path>; my $xsdFilePath = <local file path>; my $document = XML::LibXML->load_xml( location => $xmlFilePath ); my $schema = XML::LibXML::Schema->new( location => $xsdFilePath ); $schema->validate( $document );

Simple & short. To extend the above for multiple schemas, and keep the same feel, I'll want to build something that allows the following usage:

: : my $schema = XML::LibXML::Schema->new( location => $xsdFile1Path ); $schema->add( location => $xsdFile2Path ); $schema->add( location => $xsdFile3Path ); : :

Or something similar and grammatically accurate. As long as the library internally has the mechanism to support the dependencies between the schemas themselves, it shouldn't be too complicated to extend XML::LibXML::Schema and take advantage. However, if it were so, I'd imagine the author would have already made an attempt.

Do you still think I'll be successful in reusing your idea to achieve the above ?

Replies are listed 'Best First'.
Re^3: Validating an XML file with multiple schemas
by haukex (Archbishop) on Jan 05, 2019 at 11:43 UTC

    It's unclear to me whether by "multiple schemas" you mean validating one XML file against multiple different schemas, or whether it's one Schema file that includes other Schema files. Could you show a short, complete example, with simple XSD files that represent what you're trying to do? Please see Short, Self-Contained, Correct Example.

    The following works for me.

      Here's an exaggerated example where a Personal Information schema (captured as personal.xsd) uses a flexible Contact schema. A contact can be an Address, Email, a specific online id, a phone number etc., I've provided a sample address.xsd and email.xsd. The Contact section of the Personal Information schema allows such content extension using the broader, "any" element, but still keeps the validations strict on purpose.

      To test, create a temporary folder for the 5 files (2 .xml & 3 .xsd) I've provided below. I used C:\temp1 in my example, but alter the attached perl code to point to your path.

      personal.xsd

      <?xml version="1.0" encoding="UTF-8"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:per="urn:tempuri:Personal" targetNamespace="urn:tempuri:Personal" elementFormDefault="unqualified"> <element name="PersonalInfo"> <complexType> <sequence> <element name="FirstName" type="string"/> <element name="LastName" type="string"/> <element name="Contact" type="per:ContactType"/> </sequence> </complexType> </element> <complexType name="ContactType"> <sequence> <any namespace="##other" processContents="strict" maxOccurs="unbounded"/> </sequence> </complexType> </schema>

      address.xsd

      <?xml version="1.0" encoding="utf-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:Contact="urn:tempuri:Contact" targetNamespace="urn:tempuri:Contact" elementFormDefault="unqualified"> <xs:element name="Address"> <xs:complexType> <xs:sequence> <xs:element name="Street" type="xs:string"/> <xs:element name="City" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

      email.xsd

      <?xml version="1.0" encoding="utf-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:Contact="urn:tempuri:Contact" targetNamespace="urn:tempuri:Contact" elementFormDefault="unqualified"> <xs:element name="Email"> <xs:complexType> <xs:sequence> <xs:element name="EmailAddress" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

      example1.xml

      <?xml version="1.0" encoding="UTF-8"?> <pinfo:PersonalInfo xmlns:pinfo="urn:tempuri:Personal" xmlns:cinfo="urn:tempuri:Contact" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:tempuri:Personal personal.xsd"> <FirstName>First Name</FirstName> <LastName>Last Name</LastName> <Contact> <cinfo:Address xsi:schemaLocation="urn:tempuri:Contact address.xsd"> <Street>Main Street</Street> <City>Main City</City> </cinfo:Address> </Contact> </pinfo:PersonalInfo>

      example2.xml

      <?xml version="1.0" encoding="UTF-8"?> <pinfo:PersonalInfo xmlns:pinfo="urn:tempuri:Personal" xmlns:cinfo="urn:tempuri:Contact" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:tempuri:Personal personal.xsd"> <FirstName>First Name</FirstName> <LastName>Last Name</LastName> <Contact> <cinfo:Email xsi:schemaLocation="urn:tempuri:Contact email.xsd"> <EmailAddress>email1@test.org</EmailAddress> </cinfo:Email> </Contact> </pinfo:PersonalInfo>

      And finally the Perl code:

      testExample.pl

      #!/usr/bin/perl package example; use XML::LibXML; use strict; use warnings; testExample1(); testExample2(); sub testExample1 { my $schema = XML::LibXML::Schema->new( location => "C:/temp1/personal.xsd" ); my $document = XML::LibXML->load_xml( location => "C:/temp1/example1.xml" ); $schema->validate( $document ); } sub testExample2 { my $schema = XML::LibXML::Schema->new( location => "C:/temp1/personal.xsd" ); my $document = XML::LibXML->load_xml( location => "C:/temp1/example2.xml" ); $schema->validate( $document ); }

      Hopefully, you'll see an error similar to the following for the 1st example:

      C:/temp1/example1.xml:0: Schemas validity error :
          Element '{urn:tempuri:Contact}Address':
          No matching global element declaration available, but demanded
          by the strict wildcard.

      It's possible I'm missing an appropriate way to reference the contact namespace for the address and email schemas within the xml. There are other ways to successfully achieve validation, such as altering the personal.xsd file to statically import the other 2 schemas. Unfortunately, that won't be an option, unless I've mistyped/overlooked a schema definition nuance while creating the example.

      Running the test outside Perl works correctly with strict validation turned on. I did have to add (with ease) those schemas programmatically though. If there's a similar way to import the Contact namespace of either schemas in Perl, right before the XML validation, it should solve the problem too.

        element Address: Schemas validity error : Element '{urn:tempuri:Contact}Address': No matching global element declaration available, but demanded by the strict wildcard.

        I get the same error when I run xmllint on these files from the commandline. It seems to me this is more of a libxml2/Schema question than a Perl question... although I haven't yet found a good description of the issue, it seems to me that it may be a limitation of libxml2 and therefore XML::LibXML that it does not respect the xsi:schemaLocation attribute, see e.g. this bug report.

        As for the design of these Schemas, I'm not sure if having both address.xsd and email.xsd provide potentially conflicting definitions for the namespace urn:tempuri:Contact is the best solution, you might want to consider one namespace per toplevel element?

        Running the test outside Perl works correctly with strict validation turned on. I did have to add (with ease) those schemas programmatically though.

        What validator are you using here, could you share more information on how you achieved this?

        There are other ways to successfully achieve validation by altering the personal.xsd file to statically import the other 2 schemas.

        Could you explain why that's not an option? E.g. which of the files in your example can't you modify and why? On the one hand, I understand the need to just be able to plug various schemas in and have them imported automatically, on the other, being able to plug any other schema into the current one kind of defeats the purpose of validation ;-) If it were me, I might set up a workaround in which I write a script that modifies personal.xsd and adds the appropriate <import> statements to pull in the other schemas, giving me control over which Schemas I want to allow. It's all XML after all, and programmatic modification isn't a problem.

      Really appreciate you taking the effort to try things out and provide examples. It means a lot ! I will attempt to provide a more concrete example.