Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Manipulate xml with libxml

by Anonymous Monk
on Jun 07, 2005 at 17:06 UTC ( [id://464396]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I think that my brain has reached its capacity and I need help for a algorithm I am not able to implement. It is very important!

I have to parse a xml file (model.xml). In this file, there is a nested repetitive motif. For instance,

<fixe> <aaa></aaa> <bbb></bbb> <bbb></bbb> <ccc></ccc> </fixe>

I mean in each element (<aaa></aaa>, <bbb></bbb>) I can find the same structure recursivly, without number limitation. (the name of <aaa></aaa> etc. element is random, only the <fixe> has a constant name).

<fixe> <aaa> <fixe> <eee/> <bbb/> <bbb/> <ccc> <fixe> <aaa/> <eee/> <bbb/> <fff/> <ccc/> </fixe> </ccc> </fixe> </aaa> <bbb/> <bbb/> <ccc/> </fixe><br />
I use DOM with the XML::LibXML API. The goal of my script is to remove the <fixe></fixe> elements and according to parameters (A number od dupplication and name of element) to dupplicate the node. for example, name = eee and number = 4, the xml would become.

<aaa> <eee/> <eee/> <eee/> <eee/> <bbb/> <bbb/> <ccc> <aaa/> <eee/> <eee/> <eee/> <eee/> <bbb/> <fff/> <ccc/> </ccc> </aaa> <bbb/> <bbb/> <ccc/>

I have done this script which remove the element "fixe"

#!/usr/bin/perl -w use strict; use XML::LibXML; my $parser = XML::LibXML->new(); my $doc = $parser->parse_file("model.xml") or die "cannot open xml fil +e: $!"; my $node = $doc->getDocumentElement(); my @dc = $child[0]->getElementsByTagName('descriptor'); foreach my $descr (@dc) { foreach my $elt ($descr->childNodes()) { next unless $elt->nodeType() == 1; #to avoid text + nodes $descr->addSibling($elt); } my $parent = $descr->parentNode; $parent->removeChild($descr); } print $child[0]->serialize;
for the duplication, I have tried to use the addSibling function with a foreach but I got an error, I try also the cloneNode and insertBefore function but no way...
http://search.cpan.org/~phish/XML-LibXML-1.58/lib/XML/LibXML/Node.pod
Any help would be very appreciated.

Replies are listed 'Best First'.
Re: Manipulate xml with libxml
by BaldPenguin (Friar) on Jun 07, 2005 at 18:06 UTC
    I was a little confused so I must assume that the first instance of <fixe> is not truly your root element. Yet, even with my assumptions, this may help. I added the passing of $it (iteration) and $nn (node name).

    Try this:
    #!/usr/bin/perl -w -s use strict; use XML::LibXML; my $it = shift || 1; my $nn = shift || ''; print qq($it\n); my $parser = XML::LibXML->new(); my $doc = $parser->parse_file("model.xml") or die "cannot open xml fil +e: $!"; my $node = $doc->getDocumentElement(); my @dc = $node->getElementsByTagName('fixe'); foreach my $descr (@dc) { foreach my $elt ($descr->childNodes()) { next unless $elt->nodeType() == 1; #to avoid text nodes if ( $elt->nodeName eq $nn ) { $descr->addSibling($elt->cloneNode) foreach ( 1 .. $it ); } else { $descr->addSibling($elt); } } my $parent = $descr->parentNode; $parent->removeChild($descr); } print $doc->toString();
    XML is very particular in that an element is an element, it already exists. Your were on the right track with cloneNode, you just needed to combine that with the appendSibling.

    Don
    WHITEPAGES.COM | INC

    Edit by castaway: Closed small tag in signature

      Thank you very much Don!! It works well. I am happy.

      I just added the argument 1 with the cloneNode function to get the child nodes.
      $descr->addSibling($elt->cloneNode(1))
      You are right, <fixe> is not the root element of the file. I should have mentioned that.

      by the way, when I remove some nodes and then when I use toString or serialize, the xml is not really pretty. Is there a function to remove unneeded \n and to format the output with the good indentation? (a kind of "pretty print" with xmlspy for those who know this software)
        I confess LibXML doesn't create pretty XML, in fact I normally include variables to strip out the unneccesary whitespace because I send it through XSLT. You can write some kind recursive loop to fix it all or if you are not to worried about speed, send your string to XML::Twig. It has some great pretty_print functionality. I only make the comment about speed because in this you will be effectively parsing your XML twice.

        Don
        WHITEPAGES.COM | INC

        Edit by castaway: Closed small tag in signature

      Hum...I suspect a problem of recursivity if I put cloneNode(1)
      because when the first motif become a sibbling, the others one inside with the same motif are not treated yet.
      it is more complicated in fact
Re: Manipulate xml with libxml
by goonfest (Sexton) on Jun 07, 2005 at 19:59 UTC
    If you could post a valid DTD or schema or such, that would help us out over here.

    "Be proud, be a Goon"
      sorry but there isn't any DTD or schema validating the model for the moment.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://464396]
Approved by gellyfish
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-25 09:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found