in reply to Re: XML::LibXML question: How to list XInclude files, which are supposed to be included?
in thread XML::LibXML question: How to list XInclude files, which are supposed to be included?

It won't work if the main XML file includes other XML pieces, which also might include XML pieces. If you parse without XIncluding, then you get these "xi:include/@href" paths - but in this case you won't see second-level (and further) includes
  • Comment on Re^2: XML::LibXML question: How to list XInclude files, which are supposed to be included?

Replies are listed 'Best First'.
Re^3: XML::LibXML question: How to list XInclude files, which are supposed to be included?
by ikegami (Patriarch) on Jul 06, 2011 at 20:04 UTC

    What's the problem exactly? You don't know how to download the XML documents whose urls you extract, or you don't know how to parse the downloaded documents with the code I posted?

Re^3: XML::LibXML question: How to list XInclude files, which are supposed to be included?
by ikegami (Patriarch) on Jul 06, 2011 at 20:38 UTC
    use strict; use warnings; use feature qw( say ); use LWP::UserAgent qw( ); use URI qw( ); use URI::file qw( ); use XML::LibXML qw( ); use XML::LibXML::XPathContext qw( ); my $parser = XML::LibXML->new(); my $xpc = XML::LibXML::XPathContext->new(); $xpc->registerNs('xi', 'http://www.w3.org/2001/XInclude'); my $ua = LWP::UserAgent->new(); my $root_url = URI->new_abs($ARGV[0], URI::file->cwd()); my @todo = $root_url; my %found; while (@todo) { my $url = pop(@todo); my $response = $ua->get($url); if (!$response->is_success()) { warn("Can't get $url: " . $response->status_line() . "\n"); next; } my $xml = $response->decoded_content( charset => 'none' ); my $doc = $parser->parse_string($xml); for ($xpc->findnodes('//xi:include/@href', $doc)) { my $child_url = URI->new_abs($_->getValue(), $url); push @todo, $child_url if !$found{$child_url}++; } } say for sort keys %found;

    Update: Fixed constructor. Made url absolute as required.

      Sorry, you misunderstood me - I'm not trying to download any files from remote sites, all my files are local

      My point is that in order to make the parser to expand XInclude's I have to call it with option "xinclude => 1" - but in this case the resulting tree doesn't contain any original filenames, from which XML pieces have been included

      If I call the parser without this option (which is default), then it won't expand these XInclude's, so only the first-level filenames will be here (because no expansion will be happening)

      So, in both cases I can't find ALL filenames I need

      Alex

        I'm not trying to download any files from remote sites, all my files are local

        It makes no difference if the urls point to remote or local resources. LWP::UserAgent will properly handle file: urls.

        So, in both cases I can't find ALL filenames I need

        Wee! I've done the impossible!

        I'm not trying to download any files from remote sites, all my files are local

        It makes no difference if the urls point to remote or local resources. LWP::UserAgent will properly handle file: urls.

        So, in both cases I can't find ALL filenames I need

        Wee! I've done the impossible!