in reply to Re^2: XML::LibXML question: How to list XInclude files, which are supposed to be included?
in thread XML::LibXML question: How to list XInclude files, which are supposed to be included?

use strict; use warnings; use feature qw( say ); use LWP::UserAgent qw( ); use URI qw( ); use URI::file qw( ); use XML::LibXML qw( ); use XML::LibXML::XPathContext qw( ); my $parser = XML::LibXML->new(); my $xpc = XML::LibXML::XPathContext->new(); $xpc->registerNs('xi', 'http://www.w3.org/2001/XInclude'); my $ua = LWP::UserAgent->new(); my $root_url = URI->new_abs($ARGV[0], URI::file->cwd()); my @todo = $root_url; my %found; while (@todo) { my $url = pop(@todo); my $response = $ua->get($url); if (!$response->is_success()) { warn("Can't get $url: " . $response->status_line() . "\n"); next; } my $xml = $response->decoded_content( charset => 'none' ); my $doc = $parser->parse_string($xml); for ($xpc->findnodes('//xi:include/@href', $doc)) { my $child_url = URI->new_abs($_->getValue(), $url); push @todo, $child_url if !$found{$child_url}++; } } say for sort keys %found;

Update: Fixed constructor. Made url absolute as required.

  • Comment on Re^3: XML::LibXML question: How to list XInclude files, which are supposed to be included?
  • Download Code

Replies are listed 'Best First'.
Re^4: XML::LibXML question: How to list XInclude files, which are supposed to be included?
by Anonymous Monk on Jul 06, 2011 at 22:20 UTC

    Sorry, you misunderstood me - I'm not trying to download any files from remote sites, all my files are local

    My point is that in order to make the parser to expand XInclude's I have to call it with option "xinclude => 1" - but in this case the resulting tree doesn't contain any original filenames, from which XML pieces have been included

    If I call the parser without this option (which is default), then it won't expand these XInclude's, so only the first-level filenames will be here (because no expansion will be happening)

    So, in both cases I can't find ALL filenames I need

    Alex

      I'm not trying to download any files from remote sites, all my files are local

      It makes no difference if the urls point to remote or local resources. LWP::UserAgent will properly handle file: urls.

      So, in both cases I can't find ALL filenames I need

      Wee! I've done the impossible!

      I'm not trying to download any files from remote sites, all my files are local

      It makes no difference if the urls point to remote or local resources. LWP::UserAgent will properly handle file: urls.

      So, in both cases I can't find ALL filenames I need

      Wee! I've done the impossible!

        Oh yes, you did! You have even created a parser with the line:

        my $parser = LibXML::XML->new();

        which I couldn't reproduce for hours ;-)

        Thanks,

        Alex