in reply to Re^2: Validating an XML file with multiple schemas
in thread Validating an XML file with multiple schemas
It's unclear to me whether by "multiple schemas" you mean validating one XML file against multiple different schemas, or whether it's one Schema file that includes other Schema files. Could you show a short, complete example, with simple XSD files that represent what you're trying to do? Please see Short, Self-Contained, Correct Example.
The following works for me.
schema.xsd:
<?xml version="1.0" encoding="UTF-8"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.com" xmlns:foo="http://www.example.com" elementFormDefault="qualified"> <include schemaLocation="included.xsd" /> <element name="hello"> <complexType> <sequence> <element name="world" type="foo:worldType" /> </sequence> </complexType> </element> </schema>
included.xsd:
<?xml version="1.0" encoding="UTF-8"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.com" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:foo="http://www.example.com" elementFormDefault="qualified"> <import namespace="http://www.w3.org/1999/xhtml" schemaLocation= "http://www.w3.org/2002/08/xhtml/xhtml1-transitional.xsd" /> <complexType name="worldType"> <complexContent> <extension base="xhtml:Flow"> <attribute name="foo" type="string" use="required" /> </extension> </complexContent> </complexType> </schema>
Code - Note it was necessary to use XML::LibXML::externalEntityLoader() instead of $parser->input_callbacks(), because I didn't see another way for the callbacks to affect XML::LibXML::Schema.
use warnings; use strict; use utf8; use XML::LibXML; use URI; use HTTP::Tiny; my $http = HTTP::Tiny->new; my %cache; XML::LibXML::externalEntityLoader(sub { my ($url, $id) = @_; die "Can't handle ID '$id'" if length $id; my $uri = URI->new($url); my $file; if (!$uri->scheme) { $file = $url } elsif ($uri->scheme eq 'file') { $file = $uri->path } if (defined $file) { warn "'$uri' => Loading '$file' from disk\n"; #Debug open my $fh, '<', $file or die "$file: $!"; my $data = do { local $/; <$fh> }; close $fh; return $data; } # else die "Can't handle URL scheme: ".$uri->scheme unless $uri->scheme=~/\Ahttps?\z/i; if (!defined $cache{$uri}) { warn "'$uri' => Fetching...\n"; #Debug my $resp = $http->get($uri); die "$uri: $resp->{status} $resp->{reason}\n" unless $resp->{success}; $cache{$uri} = $resp->{content}; } else { warn "'$uri' => Cached\n"; } #Debug return $cache{$uri}; }); print "Loading schema...\n"; my $xsd = XML::LibXML::Schema->new( location => 'schema.xsd' ); my @xmls = (<<'END_XML_ONE',<<'END_XML_TWO',<<'END_XML_THREE'); <?xml version="1.0" encoding="UTF-8"?> <hello xmlns="http://www.example.com"> <world foo="bar"> <p xmlns="http://www.w3.org/1999/xhtml"> <i>x</i> </p> </world> </hello> END_XML_ONE <?xml version="1.0" encoding="UTF-8"?> <hello xmlns="http://www.example.com"> <world> <p xmlns="http://www.w3.org/1999/xhtml"> <i>x</i> </p> </world> </hello> END_XML_TWO <?xml version="1.0" encoding="UTF-8"?> <hello xmlns="http://www.example.com"> <world foo="bar"> <p xmlns="http://www.w3.org/1999/xhtml"> <foo>x</foo> </p> </world> </hello> END_XML_THREE my $i = 1; for my $xml (@xmls) { print "Validating XML #$i...\n"; my $doc = XML::LibXML->load_xml( string => $xml ); if ( eval { $xsd->validate($doc); 1 } ) { print "=> Valid!\n" } else { print "=> Invalid! $@" } } continue { $i++ }
Output:
Loading schema... 'schema.xsd' => Loading 'schema.xsd' from disk 'included.xsd' => Loading 'included.xsd' from disk 'http://www.w3.org/2002/08/xhtml/xhtml1-transitional.xsd' => Fetching. +.. 'http://www.w3.org/2001/xml.xsd' => Fetching... Validating XML #1... => Valid! Validating XML #2... => Invalid! unknown-137e570:0: Schemas validity error : Element '{http +://www.example.com}world': The attribute 'foo' is required but missin +g. Validating XML #3... => Invalid! unknown-137e570:0: Schemas validity error : Element '{http +://www.w3.org/1999/xhtml}foo': This element is not expected. Expected + is one of ( {http://www.w3.org/1999/xhtml}a, {http://www.w3.org/1999 +/xhtml}br, {http://www.w3.org/1999/xhtml}span, {http://www.w3.org/199 +9/xhtml}bdo, {http://www.w3.org/1999/xhtml}object, {http://www.w3.org +/1999/xhtml}applet, {http://www.w3.org/1999/xhtml}img, {http://www.w3 +.org/1999/xhtml}map, {http://www.w3.org/1999/xhtml}iframe, {http://ww +w.w3.org/1999/xhtml}tt ).
And just for the sake of completeness, here's the original code I posted on StackOverflow that uses an XML::LibXML::InputCallback:
use warnings; use strict; use XML::LibXML; use HTTP::Tiny; use URI; my $parser = XML::LibXML->new; my $cb = XML::LibXML::InputCallback->new; my $http = HTTP::Tiny->new; my %cache; $cb->register_callbacks([ sub { 1 }, # match (URI), returns Bool sub { # open (URI), returns Handle my $uri = URI->new($_[0]); my $file; #warn "Handling <<$uri>>\n"; #Debug if (!$uri->scheme) { $file = $_[0] } elsif ($uri->scheme eq 'file') { $file = $uri->path } elsif ($uri->scheme=~/\Ahttps?\z/i) { if (!defined $cache{$uri}) { my $resp = $http->get($uri); die "$uri: $resp->{status} $resp->{reason}\n" unless $resp->{success}; $cache{$uri} = $resp->{content}; } $file = \$cache{$uri}; } else { die "unsupported URL scheme: ".$uri->scheme } open my $fh, '<', $file or die "$file: $!"; return $fh; }, sub { # read (Handle,Length), returns Data my ($fh,$len) = @_; read($fh, my $buf, $len); return $buf; }, sub { close shift } # close (Handle) ]); $parser->input_callbacks($cb); my $doc = $parser->load_xml( IO => \*DATA ); print "Is valid: ", $doc->is_valid ? "yes" : "no", "\n"; __DATA__ <?xml version="1.0" encoding="UTF-8" standalone="no"?> <!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "https://www.nc +bi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd" [ <!ENTITY base.url "https://some.domain.com"> <!ENTITY icon.url "https://some.domain.com/logo.png"> ]> <LinkSet> <Link> <LinkId>1</LinkId> <ProviderId>XXXX</ProviderId> <IconUrl>&icon.url;</IconUrl> <ObjectSelector> <Database>PubMed</Database> <ObjectList> <ObjId>1234567890</ObjId> </ObjectList> </ObjectSelector> <ObjectUrl> <Base>&base.url;</Base> <Rule>/1/</Rule> </ObjectUrl> </Link> </LinkSet>
And finally, here's a variation of the caching code that uses an on-disk cache (Update: It's not perfect, because there's a tiny chance of filename collisions if clean_fragment happens to map two URLs to the same filename, but this is meant to be more of a proof-of-concept; there are plenty of other caching mechanisms available. Just one example, note how I used Memoize::Storable to cache the return values of the get_deps function here.):
my $CACHE_DIR = '/tmp/xmlcache'; use File::Path qw/make_path/; make_path($CACHE_DIR, {verbose=>1}); use URI; use HTTP::Tiny; use Text::CleanFragment qw/clean_fragment/; use File::Spec::Functions qw/catfile/; my $http = HTTP::Tiny->new; XML::LibXML::externalEntityLoader(sub { my ($url, $id) = @_; die "Can't handle ID '$id'" if length $id; my $uri = URI->new($url); my $file; if (!$uri->scheme) { $file = $url } elsif ($uri->scheme eq 'file') { $file = $uri->path } elsif ($uri->scheme=~/\Ahttps?\z/i) { # Note there is a (tiny) chance of filename collisions here! $file = catfile($CACHE_DIR, clean_fragment("$uri")); if (!-e $file) { warn "'$uri' => Mirroring to '$file'...\n"; #Debug my $resp = $http->mirror($uri, "$file"); die "$uri: $resp->{status} $resp->{reason}\n" unless $resp->{success}; } } else { die "Can't handle URL scheme: ".$uri->scheme } warn "'$uri' => Loading '$file' from disk\n"; #Debug open my $fh, '<', $file or die "$file: $!"; my $data = do { local $/; <$fh> }; close $fh; return $data; });
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Validating an XML file with multiple schemas
by mart0000 (Initiate) on Jan 08, 2019 at 16:30 UTC | |
by haukex (Archbishop) on Jan 08, 2019 at 17:43 UTC | |
by mart0000 (Initiate) on Jan 10, 2019 at 06:06 UTC | |
by haukex (Archbishop) on Jan 10, 2019 at 11:19 UTC | |
|
Re^4: Validating an XML file with multiple schemas
by mart0000 (Initiate) on Jan 07, 2019 at 03:53 UTC |