Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

XML dont include parent node

by zak_s (Initiate)
on Dec 10, 2013 at 16:59 UTC ( [id://1066468]=perlquestion: print w/replies, xml ) Need Help??

zak_s has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!
im not an expert in perl but I have tried to find a way to get this done but not working well for me. what is the best xpath expression to get the element of a node but without including the parent tags ?

i.e <Root> <Parent> <child1></child1> <child2></child2> <child3></child3> </Parent> </Root>

im only interested in the 3 children within the parent tags, where each has a different name.
What im currently using is '//Parent' im using findnodes from ::LibXML

Replies are listed 'Best First'.
Re: XML dont include parent node
by tangent (Parson) on Dec 10, 2013 at 17:35 UTC
    This is not a modified Xpath but shows how you can access the child nodes:
    use XML::LibXML; my $string = q| <Root> <Parent> <child1>Child 1</child1> <child2>Child 2</child2> <child3>Child 3</child3> </Parent> </Root>|; my $doc = XML::LibXML->load_xml(string => $string); my @nodes = $doc->findnodes('//Parent'); for my $node (@nodes) { my @childnodes = $node->childNodes or next; for my $cnode (@childnodes) { if ($cnode->nodeName =~ m/^child/) { print 'name: '. $cnode->nodeName . ', '; print 'content: '. $cnode->textContent ."\n"; } } } # Output: # name: child1, content: Child 1 # name: child2, content: Child 2 # name: child3, content: Child 3
    See XML::LibXML::Node

      Thanks this helps.
      one more question is there any way to ignore empty lines when copying nodes? Any expression I can use to delete all empty lines ?

Re: XML dont include parent node
by smls (Friar) on Dec 11, 2013 at 01:55 UTC

    Based on your description I'm not 100% sure which of the following two things you want to achieve, so let me address both:

    A) Get a list of all child nodes of the Parent node

    One solution is to match the Parent node via an XPath expression, and then call the childNodes method on it (which is what tangent already suggested above).

    An alternative solution is to use an asterisk wildcard directly in the XPath expression, e.g. in your example the expression passed to findnodes would become '//Parent/*'.
    However if there are multiple Parent nodes in the document, this would return all their children as one flat list, whereas tangent's solution allows you to handle each set separately.
    Another difference is that the asterisk expression only matches element nodes, whereas the childNodes method also lists text or CDATA nodes (including whitespace strings in between the child elements, although there is an alternative method called nonBlankChildNodes which avoids that).

    If you are indeed only interested in the child elements, but want to process each set separately in case of multiple Parent nodes, you could either combine childNodes with a check for nodeName (like tangent's solution does), or use a stand-alone asterisk-query:

    my $doc = XML::LibXML->load_xml( ... ); foreach my $parent ($doc->findnodes('//Parent')) { my @childElements = $parent->findnodes('*'); # ...do stuff with @childElements... }

    B) Get a string serialization of the Parent node, but with the actual Parent start/end tags stripped...

    ...akin to the .innerHTML property available in JavaScript/DOM.

    XML::LibXML does not provide this feature, and the reason is probably that, unlike with HTML, an XML snippet requires a single root element in order to be valid XML.

    You could still achieve it by getting the list of child nodes (see section A above), calling the toString method on each, and concatenating the resulting strings:

    my $doc = XML::LibXML->load_xml( ... ); foreach my $parent ($doc->findnodes('//Parent')) { print "Found Parent node with the following XML content:"; print innerXML($parent); } sub innerXML { join '', map { $_->toString } shift->childNodes(); }

    Or by calling toString directly on the Parent node, and using regexes to try and strip off the outer start/end tags (but this will be messy and error-prone).

Re: XML dont include parent node
by derby (Abbot) on Dec 10, 2013 at 17:39 UTC

    You could always use the start-with function:

    my @nodes = $root->findnodes( '/Root/*/*[starts-with(name(), "child")] +' );

    -derby
      I think you're taking the OP's example XML snippet too literally... :)
Re: XML dont include parent node
by Discipulus (Canon) on Dec 11, 2013 at 09:43 UTC
    Hello there

    Remember for future questions to be as precise as you can, so that others can help you effectively (consider to read Understanding-and-Using-PerlMonks).

    I humbly think that xpath are not intended to do match as in your case (child1, child2, child3..). I also think another design for your data will be better, if you can choice: all tag are 'child' and each one have a numerical 'id'.

    In Perl there are many way to get the work done (and speaking about xml they are many * many.. see the poll), so I present a XML::Twig solution. Handlers are subs that are called during parsing, here you can use a normal Perl regex to filter unwanted results (i putted an 'ufo' in the xml data..).
    use warnings; use strict; use XML::Twig; my $xml=<<'XML'; <Root> <Parent> <child1>Child 1</child1> <child2>Child 2</child2> <child3>Child 3</child3> <ufo> Ufo there!</ufo> </Parent> </Root> XML my $twig= new XML::Twig( pretty_print => 'indented', twig_handlers => { '/Root/Parent/*' => \&fie +ld }, ); $twig->parse( $xml); sub field { my( $twig, $field)= @_; return unless $field->gi() =~ /^child/i; $field->print; #OR print $field->text(); } #OUTPUT # # <child1>Child 1</child1> # <child2>Child 2</child2> # <child3>Child 3</child3>


    Hth
    L*
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      I humbly think that xpath are not intended to do match as in your case (child1, child2, child3..)

      Hmm, but they seem to do exactly that, which kinda means they are intended to do it

      $ xmllint --xpath " //Parent " foot.xml <Parent> <child1/> <child2/> <child3/> </Parent> $ xmllint --xpath " //Parent/* " foot.xml <child1/><child2/><child3/> $ xmllint --xpath " /Root/*/*[starts-with(name(), 'child')] " foot.x +ml <child1/><child2/><child3/>

      ... design for your data ...

      I think too often the person asking how-something-xml doesn't have a choice in the design :)

        i suspected to be wrong there. thanks. Maybe that syntax (strats-with(name)..) is not available in XML::Twig, or, more probably i'm not able to get rid of:
        my @all = $twig->get_xpath ('/Root/Parent/child*'); #gives:error in xpath expression... my @all = $twig->get_xpath ('/Root/*/*[starts-with(name(), "child")] +'); #also gives error in xpath expression.. #someresults with the findnodes method..
        XML::Twig docs says these methods are similar to the XML::LibXML method. Being probably 'similar' the key word.

        Albeit, if someone is able...welcome!

        L*
        There are no rules, there are no thumbs..
        Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1066468]
Approved by taint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2024-04-24 09:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found