in reply to XML::LibXML question: How to list XInclude files, which are supposed to be included?

use strict; use warnings; use feature qw( say ); use XML::LibXML qw( ); use XML::LibXML::XPathContext qw( ); my $parser = XML::LibXML->new(); my $doc = $parser->parse_file($qfn); my $xpc = XML::LibXML::XPathContext->new($doc); $xpc->registerNs('xi', 'http://www.w3.org/2001/XInclude'); for ($xpc->findnodes('//xi:include/@href')) { say $_->getValue(); }

Also works,

for ($xpc->findnodes('//xi:include')) { say $_->getAttribute('href'); }

Update: Fixed constructor.

  • Comment on Re: XML::LibXML question: How to list XInclude files, which are supposed to be included?
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: XML::LibXML question: How to list XInclude files, which are supposed to be included?
by AlexFromNJ (Novice) on Jul 06, 2011 at 19:06 UTC
    It won't work if the main XML file includes other XML pieces, which also might include XML pieces. If you parse without XIncluding, then you get these "xi:include/@href" paths - but in this case you won't see second-level (and further) includes

      What's the problem exactly? You don't know how to download the XML documents whose urls you extract, or you don't know how to parse the downloaded documents with the code I posted?

      use strict; use warnings; use feature qw( say ); use LWP::UserAgent qw( ); use URI qw( ); use URI::file qw( ); use XML::LibXML qw( ); use XML::LibXML::XPathContext qw( ); my $parser = XML::LibXML->new(); my $xpc = XML::LibXML::XPathContext->new(); $xpc->registerNs('xi', 'http://www.w3.org/2001/XInclude'); my $ua = LWP::UserAgent->new(); my $root_url = URI->new_abs($ARGV[0], URI::file->cwd()); my @todo = $root_url; my %found; while (@todo) { my $url = pop(@todo); my $response = $ua->get($url); if (!$response->is_success()) { warn("Can't get $url: " . $response->status_line() . "\n"); next; } my $xml = $response->decoded_content( charset => 'none' ); my $doc = $parser->parse_string($xml); for ($xpc->findnodes('//xi:include/@href', $doc)) { my $child_url = URI->new_abs($_->getValue(), $url); push @todo, $child_url if !$found{$child_url}++; } } say for sort keys %found;

      Update: Fixed constructor. Made url absolute as required.

        Sorry, you misunderstood me - I'm not trying to download any files from remote sites, all my files are local

        My point is that in order to make the parser to expand XInclude's I have to call it with option "xinclude => 1" - but in this case the resulting tree doesn't contain any original filenames, from which XML pieces have been included

        If I call the parser without this option (which is default), then it won't expand these XInclude's, so only the first-level filenames will be here (because no expansion will be happening)

        So, in both cases I can't find ALL filenames I need

        Alex