HeadScratcher has asked for the wisdom of the Perl Monks concerning the following question:

I'm a humble peasant seeking the wisdom of the holy ones please enlighten me

I'm attempting to extract elements from an XML file

in the code bellow how do I adapt it so I can tell the difference between the 2 'title' fields and how do I get the value of point?

I'm hoping if I can work this out I can use the same principle to extract other items like author

#!C:\strawberry\perl\bin\perl.exe use strict; use warnings; use XML::LibXML; use XML::LibXML::XPathContext; use Data::Dump qw(dump); my $filename = 'georss.xml'; my $dom = XML::LibXML->load_xml(location => $filename); my $xpc = XML::LibXML::XPathContext->new($dom); $xpc->registerNs(dft => "http://www.w3.org/2005/Atom"); $xpc->registerNs(georss => "http://www.georss.org/georss"); my $title = $xpc->findnodes('//dft:title'); print "title $title\n"; my $point = $xpc->findnodes('//georss:where/point'); print "point $point\n";

OUTPUT from above code

title EarthquakesM 3.2, Mona Passage

point

georss.xml

<?xml version="1.0" encoding="utf-8"?> <feed xmlns="http://www.w3.org/2005/Atom" xmlns:georss="http://www.geo +rss.org/georss" xmlns:gml="http://www.opengis.net/gml"> <title>Earthquakes</title> <subtitle>International earthquake observation labs</subtitle> <link href="http://example.org/" /> <updated>2005-12-13T18:30:02Z</updated> <author> <name>Dr. Thaddeus Remor</name> <email>tremor@quakelab.edu</email> </author> <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id> <entry> <title>M 3.2, Mona Passage</title> <link href="http://example.org/2005/09/09/atom01" /> <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id> <updated>2005-08-17T07:02:32Z</updated> <summary>We just had a big one.</summary> <georss:where> <point>45.256 -71.92</point> </georss:where> </entry> </feed>

Replies are listed 'Best First'.
Re: perl XML::LibXML help reading an xml file that uses namespaces (updated)
by haukex (Archbishop) on Aug 07, 2016 at 14:16 UTC

    Hi HeadScratcher,

    tell the difference between the 2 'title' fields

    //dft:title means to find the corresponding <title> nodes anywhere in the tree. To select the <title> that is a child of <feed>, use for example /dft:feed/dft:title, or to select <title> nodes that are children of <entry> nodes anywhere in the tree, you could use //dft:entry/dft:title. There are other ways to tell nodes apart, such as using attributes, siblings, etc. - choosing the correct XPath conditions for selecting the nodes is up to you, depending on the structure of the document and your own criteria.

    how do I get the value of point?

    xmlns="http://www.w3.org/2005/Atom" will be the namespace of nodes that don't have any prefix. So <point> is in that namespace, and in your XPath you've given that namespace the alias dft. The proper XPath expression is then //georss:where/dft:point

    Note these aren't really Perl questions, really more like XPath questions, you I'd recommend you look into XPath a bit more.

    Hope this helps,
    -- Hauke D

    Update: s/direct descendant/child/g

      Thank you very much not just for the answer but also the explanation Now I know what to look for I can do more research on XPath