Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

XML::LibXML and Namespace… I don't get it

by Skeeve (Parson)
on Jun 06, 2014 at 15:21 UTC ( [id://1089041]=perlquestion: print w/replies, xml ) Need Help??

Skeeve has asked for the wisdom of the Perl Monks concerning the following question:

I found this interesing node: 1003089 and tried to adopt it to my problem.

This is the code I have:

#!/opt/local/bin/perl use strict; use warnings; use XML::LibXML; use Data::Dumper; my $parser = XML::LibXML->new; my $doc = $parser->load_xml( location =>shift, validation => 0, load_ext_dtd => 0, ); my $xpc = XML::LibXML::XPathContext->new; $xpc->registerNs( xml => "http://www.w3.org/1999/xhtml" ); print "result: "; foreach my $node ($xpc->findnodes('/*', $doc)) { print $node->nodeName,"\n"; } # result: html print "\n"; print "result: "; foreach my $node ($xpc->findnodes('/html', $doc)) { print $node->nodeName,"\n"; } # result: print "\n"; print "result: "; foreach my $node ($xpc->findnodes('/xml:html', $doc)) { print $node->nodeName,"\n"; } # result: print "\n";

You see? When I use "/*" as my query, I get my html node. But when I use "/html" or "/xml:html", I get nothing.

What's wrong with my code - or my lack of understanding of XML::LibXML?

For completeness, here is the beginning of my XML file:

<?xml version="1.0" encoding ="UTF-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" xmlns:epub="epub" > <head>

Update: When I remove the namespace from the file, it seems to work :(


s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

Replies are listed 'Best First'.
Re: XML::LibXML and Namespace… I don't get it
by tobyink (Canon) on Jun 06, 2014 at 16:08 UTC

    The namespace is this bit:

    xmlns="http://www.w3.org/1999/xhtml"

    Not this bit:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

    So you should be setting up your XPath context like this:

    $xpc->registerNs( xml => "http://www.w3.org/1999/xhtml" );

    That said, it's probably not a good idea to use the name "xml" for the XHTML namespace. Names beginning with those three letters are reserved for special purposes by the XML specification. So it's probably a better idea to use a different name:

    $xpc->registerNs( h => "http://www.w3.org/1999/xhtml" );
    use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name
      From Namespaces in XML,

      The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace. It MAY, but need not, be declared, and MUST NOT be bound to any other namespace name. Other prefixes MUST NOT be bound to this namespace name, and it MUST NOT be declared as the default namespace.

      The prefix xmlns is used only to declare namespace bindings and is by definition bound to the namespace name http://www.w3.org/2000/xmlns/. It MUST NOT be declared . Other prefixes MUST NOT be bound to this namespace name, and it MUST NOT be declared as the default namespace. Element names MUST NOT have the prefix xmlns.

      All other prefixes beginning with the three-letter sequence x, m, l, in any case combination, are reserved. This means that:

      • users SHOULD NOT use them except as defined by later specifications

      • processors MUST NOT treat them as fatal errors.

        Thanks for confirming and in fact this was one of my 2 mistaks.


        s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
        +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

      Thanks! I already noticed the stuff about the dtd in place of the namespace. Stupid mistake.

      That stupid mistake interfered with the second one you pointed out and ikegami confirmed.

      So using the correct namespace and another prefix than "xml" solved the problem.

      Thanks a lot!


      s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
      +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
Re: XML::LibXML and Namespace… I don't get it
by sundialsvc4 (Abbot) on Jun 07, 2014 at 00:17 UTC

    As an aside ... one of the vexing problems that I often have with XPath ... powerful though this facility is ... is to get the dammed query exactly-right.   Web sites such as http://www.xpathtester.com/xpath can be priceless, because they let you stuff some data in, and (attempt to) run XPath queries against it, and see what happens (or, what errors are returned).

    That way, you’re only banging your head against one brick-wall at a time:   first, getting the XPath query correct and knowing what results it should be producing; then, getting it to run in your Perl and to produce those same results.   (And it’s a reasonable guess that libxml is being used in both cases.)

Re: XML::LibXML and Namespace… I don't get it
by taint (Chaplain) on Jun 06, 2014 at 15:57 UTC
    For clarity.

    What is the Name of your "XML file"?
    is /html a File, or a Directory?
    Could this ambiguity be the problem?

    Best wishes.

    --Chris

    ˇλɐp ʇɑəɹ⅁ ɐ əʌɐɥ puɐ ʻꜱdləɥ ꜱᴉɥʇ ədoH

      The filename is given on the commandline.

      see:

      location =>shift

      s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
      +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
        Uh huh. Sorry.

        It all just smells of a File Handle issue. But apparently I'm as confused as XML::LibXML.

        --Chris

        ˇλɐp ʇɑəɹ⅁ ɐ əʌɐɥ puɐ ʻꜱdləɥ ꜱᴉɥʇ ədoH

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1089041]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2024-03-28 16:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found