in reply to Re^3: Validating an XML file with multiple schemas
in thread Validating an XML file with multiple schemas

Here's an exaggerated example where a Personal Information schema (captured as personal.xsd) uses a flexible Contact schema. A contact can be an Address, Email, a specific online id, a phone number etc., I've provided a sample address.xsd and email.xsd. The Contact section of the Personal Information schema allows such content extension using the broader, "any" element, but still keeps the validations strict on purpose.

To test, create a temporary folder for the 5 files (2 .xml & 3 .xsd) I've provided below. I used C:\temp1 in my example, but alter the attached perl code to point to your path.

personal.xsd

<?xml version="1.0" encoding="UTF-8"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:per="urn:tempuri:Personal" targetNamespace="urn:tempuri:Personal" elementFormDefault="unqualified"> <element name="PersonalInfo"> <complexType> <sequence> <element name="FirstName" type="string"/> <element name="LastName" type="string"/> <element name="Contact" type="per:ContactType"/> </sequence> </complexType> </element> <complexType name="ContactType"> <sequence> <any namespace="##other" processContents="strict" maxOccurs="unbounded"/> </sequence> </complexType> </schema>

address.xsd

<?xml version="1.0" encoding="utf-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:Contact="urn:tempuri:Contact" targetNamespace="urn:tempuri:Contact" elementFormDefault="unqualified"> <xs:element name="Address"> <xs:complexType> <xs:sequence> <xs:element name="Street" type="xs:string"/> <xs:element name="City" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

email.xsd

<?xml version="1.0" encoding="utf-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:Contact="urn:tempuri:Contact" targetNamespace="urn:tempuri:Contact" elementFormDefault="unqualified"> <xs:element name="Email"> <xs:complexType> <xs:sequence> <xs:element name="EmailAddress" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

example1.xml

<?xml version="1.0" encoding="UTF-8"?> <pinfo:PersonalInfo xmlns:pinfo="urn:tempuri:Personal" xmlns:cinfo="urn:tempuri:Contact" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:tempuri:Personal personal.xsd"> <FirstName>First Name</FirstName> <LastName>Last Name</LastName> <Contact> <cinfo:Address xsi:schemaLocation="urn:tempuri:Contact address.xsd"> <Street>Main Street</Street> <City>Main City</City> </cinfo:Address> </Contact> </pinfo:PersonalInfo>

example2.xml

<?xml version="1.0" encoding="UTF-8"?> <pinfo:PersonalInfo xmlns:pinfo="urn:tempuri:Personal" xmlns:cinfo="urn:tempuri:Contact" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:tempuri:Personal personal.xsd"> <FirstName>First Name</FirstName> <LastName>Last Name</LastName> <Contact> <cinfo:Email xsi:schemaLocation="urn:tempuri:Contact email.xsd"> <EmailAddress>email1@test.org</EmailAddress> </cinfo:Email> </Contact> </pinfo:PersonalInfo>

And finally the Perl code:

testExample.pl

#!/usr/bin/perl package example; use XML::LibXML; use strict; use warnings; testExample1(); testExample2(); sub testExample1 { my $schema = XML::LibXML::Schema->new( location => "C:/temp1/personal.xsd" ); my $document = XML::LibXML->load_xml( location => "C:/temp1/example1.xml" ); $schema->validate( $document ); } sub testExample2 { my $schema = XML::LibXML::Schema->new( location => "C:/temp1/personal.xsd" ); my $document = XML::LibXML->load_xml( location => "C:/temp1/example2.xml" ); $schema->validate( $document ); }

Hopefully, you'll see an error similar to the following for the 1st example:

C:/temp1/example1.xml:0: Schemas validity error :
    Element '{urn:tempuri:Contact}Address':
    No matching global element declaration available, but demanded
    by the strict wildcard.

It's possible I'm missing an appropriate way to reference the contact namespace for the address and email schemas within the xml. There are other ways to successfully achieve validation, such as altering the personal.xsd file to statically import the other 2 schemas. Unfortunately, that won't be an option, unless I've mistyped/overlooked a schema definition nuance while creating the example.

Running the test outside Perl works correctly with strict validation turned on. I did have to add (with ease) those schemas programmatically though. If there's a similar way to import the Contact namespace of either schemas in Perl, right before the XML validation, it should solve the problem too.

Replies are listed 'Best First'.
Re^5: Validating an XML file with multiple schemas
by haukex (Archbishop) on Jan 08, 2019 at 17:43 UTC
    element Address: Schemas validity error : Element '{urn:tempuri:Contact}Address': No matching global element declaration available, but demanded by the strict wildcard.

    I get the same error when I run xmllint on these files from the commandline. It seems to me this is more of a libxml2/Schema question than a Perl question... although I haven't yet found a good description of the issue, it seems to me that it may be a limitation of libxml2 and therefore XML::LibXML that it does not respect the xsi:schemaLocation attribute, see e.g. this bug report.

    As for the design of these Schemas, I'm not sure if having both address.xsd and email.xsd provide potentially conflicting definitions for the namespace urn:tempuri:Contact is the best solution, you might want to consider one namespace per toplevel element?

    Running the test outside Perl works correctly with strict validation turned on. I did have to add (with ease) those schemas programmatically though.

    What validator are you using here, could you share more information on how you achieved this?

    There are other ways to successfully achieve validation by altering the personal.xsd file to statically import the other 2 schemas.

    Could you explain why that's not an option? E.g. which of the files in your example can't you modify and why? On the one hand, I understand the need to just be able to plug various schemas in and have them imported automatically, on the other, being able to plug any other schema into the current one kind of defeats the purpose of validation ;-) If it were me, I might set up a workaround in which I write a script that modifies personal.xsd and adds the appropriate <import> statements to pull in the other schemas, giving me control over which Schemas I want to allow. It's all XML after all, and programmatic modification isn't a problem.

      I think I suspected some limitation around libxml2 myself. It was hard to tell without enough experience with it. As for the schema examples, they were crafted to demonstrate the condition. So I could have defined the namespace either way - shared/unique, with consistent results. Having said that, when designing schemas with high reuse and extensibility, the shared namespace will start to make sense, given the right context and utilization. Very useful in larger, shared projects.

      The external validator was java based. There are a few other commercial tools out there that would have worked just as well. The original schema from which I modeled the personal.xsd, is part of a larger set managed by a vendor. The set has been in use for several years, by us and other clients. So alteration was never in scope. And besides, clients using C++ and Java processors have no trouble consuming (and generating) XML based on these schemas. I don't think I would have, either. It's just that my particular effort required the use of Perl.

      I believe I will take a different approach to validating the XML, at the expense of labor :-(. I will also attempt to get in touch with the xmlsoft, when time permits, to see if they'll view this as something to be solved (or have solved) in a future release. You appear to be very knowledgeable on this subject as well ! I thank you for your willingness and overall attitude.

        I've worked with XML a fair amount and written some Schemas myself, but it seems I hadn't come across something as (seemingly) complex as you're describing :-) Anyway, there are of course other modules you could try out and see if they work for you, I just don't have any experience with them. Other than that, if you do end up having to go the route of calling an external validator, then my module IPC::Run3::Shell might be of interest. Here's an example I adapted from one of my older projects where I call an external xmllint (this was before I had figured out the above solution using XML::LibXML):

        use IPC::Run3::Shell ':FATAL', [ xmllint => qw/ xmllint --noout --nonet --path /, $external_schemas_path, '--schema']; my $pass = eval { xmllint($schema, $file); $?==0 }; print $file, ": ", $pass ? "PASS" : "FAIL", "\n";

        The module also offers ways to capture STDOUT and STDERR if you want to suppress and/or inspect that.

        If the external validator is Java, then I've played around with Inline::Java, and it seems to work ok too.