Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Replacing an XPath node with the value of its content

by stylechief (Sexton)
on Oct 21, 2013 at 19:09 UTC ( [id://1059156]=perlquestion: print w/replies, xml ) Need Help??

stylechief has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Honorable Monks,

I need to replace certain <tm> elements in an XML document with the value of the <tm> element. That is, I need to eliminate the element but retain its content in-place.

For example, If I find:  <tm tmclass="ibm" tmowner="IBM Corporation" tmtype="reg" trademark="AIX">AIX</tm>;

I need to replace the entire <tm>...</tm> element with the text AIX.

I am using XML::LibXML to detect the @tmowner, and trigger the above function. For example:

use strict; use File::Find; use XML::LibXML; my ($file, $parser, $doc, $query, $node, $val, $tmtext); @ARGV=('.'); # search every file in this dir and all subdirs find (\&search, @ARGV); sub search{ $file=$_; if (grep -f && /\.xml$/i, $file){ $parser = XML::LibXML->new(); $doc = $parser->parse_file($file); $query = "//tm"; foreach $node ($doc->findnodes($query)){ $val = $node -> findvalue('@tmowner'); $tmtext = $node -> textContent(); # if we don't own the trademark, replace it with the text if ($val !~ /my_company/i){ ### REPLACE THE TM ELEMENT WITH THE TEXT CONTENT ### print " Replacing trademark element with $tmtext\n"; } } # foreach node } # if grep } # sub search

I have not had any luck with the replaceNode or replaceChild methods for this purpose. I know I could do this with regexps, but that might get messy with all of the preformatting necessary to manage multiple trademarks on one line.

Tips and ideas are appreciated. Thanks for your time.

Replies are listed 'Best First'.
Re: Replacing an XPath node with the value of its content
by derby (Abbot) on Oct 21, 2013 at 21:28 UTC

    Is the tm element the only child of another node? If so, you can get the parent of the matching node, remove it's children, then insert the text node:

    use strict; use warnings; use XML::LibXML; my $xml_str = <<_XML_STR; <foo> <bar> <tm tmclass="ibm" tmowner="IBM Corporation" tmtype="reg" trademark="AIX">AIX</tm> </bar> </foo> _XML_STR my $parser = XML::LibXML->new(); my $doc = $parser->parse_string( $xml_str ); my $query = "//tm"; foreach my $node ( $doc->findnodes($query) ) { my $val = $node->findvalue( '@tmowner' ); my $txt = $node->textContent(); if( $val !~ /my_company/ ) { my $parent = $node->parentNode(); $parent->removeChildNodes(); $parent->appendTextNode( $txt ); } } print $doc->toString;
    would produce:
    <foo> <bar>AIX</bar> </foo>
    -derby

      Thanks, Derby, this is a step in the right direction.

      Unfortunately, it is not the only child of another node. In fact, there may be multiple <tm> or other children within a given parent.

      This <tm> element is being used to trigger inline formatting during output processing. In this example, a superscript "TM" would be added automatically like AIXTM. So there may be more than 1 in the same sentence, paragraph, etc. within a single parent.

      Testing your suggestion, it certainly removes the <tm> child, but appends the "AIX" to the very end of the parent, as one would expect (and as you insinuated).

      I am currently looking into the XML::LibXML::Document class. It has some methods that look promising, perhaps I can use a fragment or text node somehow.

      Thanks for looking into this!

      SC

Re: Replacing an XPath node with the value of its content
by choroba (Cardinal) on Oct 21, 2013 at 23:11 UTC
    Using XML::XSH2, a wrapper around XML::LibXML:
    open file.xml ; for //tm mv text() replace . ; save :b ;

    Note: Does not work if <tm>'s can be nested.

    Update: If <tm> can contain something more than just a text, you have to use the more general "unwrap" method:

    for //tm xmv node() replace . ;

    Tested with

    <tm>a<b>c</b>c<?run test?><!-- comment --></tm>
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Thanks for the reply. I will look into XML::XSH2, as I am unfamiliar with that module.
      SC
Re: Replacing an XPath node with the value of its content
by Anonymous Monk on Oct 21, 2013 at 23:05 UTC
    Keep fiddling with replaceChild because that is the way to do it.
    1. Build a text-node containing the new content.
    2. Navigate to the parent node and instruct it to replaceChild the existing node with the new text-node.
    Remember that the replacement node must be coined from the same document.
      Your step #1 was the ticket! Thanks AM. I was not first constructing a new text node.

      The simple fix:
      $parentnode = $node -> parentNode; $txtnode = XML::LibXML::Text ->new($tmtext); $node = $parentnode -> replaceChild($txtnode, $node);
      SC
Re: Replacing an XPath node with the value of its content TIMTOWTDI
by Discipulus (Canon) on Oct 22, 2013 at 08:06 UTC
    UPDATE:a warn to you all: do not use regexes against XML! they (and you too) loose.

    In the spirit of more then one way I present an XML::Twig solution.
    I'm sure there is a more elegant way to do it, anyway.
    #!/usr/bin/perl use strict; use warnings; use XML::Twig; my $twig= XML::Twig->new( pretty_print => 'indented', twig_roots => { 'tm' => 1 }, twig_print_outside_roots => 1, twig_handlers => { tm=>sub{my $text = $_->te +xt(); $_->cut();print $text; }, }, ); $twig->parse('<?xml version="1.0"?><stats><tm tmclass="ibm" tmowner="I +BM Corporation" tmtype="reg" trademark="AIX">AIX</tm></stats>');
    hth
    L*

    UPDATE2: i'm not sure to understand your case but maybe this handler is what you need to print only one time te text:
    #declare a global: my $only_one=0; #same as before, then: twig_handlers => { tm=>sub{my $text = $_->text(); $only_one ? $_-> +cut() : ( $_->cut() and print $text and $only_one++ ) }, __DATA__ <?xml version="1.0"?> <stats> <tm tmclass="ibm" tmowner="IBM Corporation" tmtype=" +reg" trademark="AIX">AIX</tm> <tm tmclass="ibm" tmowner="IBM Corporation" tmtype=" +reg" trademark="AIX">AIX</tm> <tm tmclass="ibm" tmowner="IBM Corporation" tmtype=" +reg" trademark="AIX">AIX</tm> </stats>' __OUTPUT__ <?xml version="1.0"?> <stats> AIX </stats>
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Replacing an XPath node with the value of its content
by Lennotoecom (Pilgrim) on Oct 21, 2013 at 23:14 UTC
    Why do you even use this module xmllib,
    can't you do it through some regexps?
      This can certainly be accomplished via regexps (usually my default mode for this sort of thing). In this case, however, I'm dealing with thousands of files authored by an assortment of tools and people, and would prefer to use a method that helps ensure that syntax and context remain intact 100% of the time when passed chaotic code. If I can't find an XPath-style solution, I will go the regexp route.
      SC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1059156]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-19 05:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found