sbafna has asked for the wisdom of the Perl Monks concerning the following question:

I have the below complex yet small XML snippet saved in a file, with multiple entries of ENVDETAILS.

<OUTMOST> <MYENVDETAILS> <ENVDETAILS id="abc" > <DMGRHOST>a.b.c.d</DMGRHOST> <CELL>XYZ</CELL> <NODE nodeid="1" > <NODEHOST>v.x.y.z</NODEHOST> <IHSNODE>v_ihsnode</IHSNODE> <IHSJVM ihsjvmtype="log_ihs" > <IHSJVMNAME>abc_logihs1</IHSJV +MNAME> </IHSJVM> <IHSJVM ihsjvmtype="appihs" > <IHSJVMNAME>abc_appihs1</IHSJV +MNAME> </IHSJVM> <JVM jvmtype="nodeagent" > <JVMNAME>nodeagent</JVMNAME> </JVM> <JVM jvmtype="api" > <JVMNAME>abc_api1.1</JVMNAME> <JVMNAME>abc_api1.2</JVMNAME> </JVM> </NODE> </ENVDETAILS> </MYENVDETAILS> </OUTMOST>

Few things we must know:

a). nodeid is not unique.

b). ENVDETAILS id is unique.

I wanted to achieve/perform the following things:

1). Display all the ENVDETAILS ids: [output should have all ids: abc]

2). Display all the jvms corresponding to ENVDETAILS ids & after checking the jvm type: [input: (id=abc)(jvmtype=api) output should have all jvms of type 'api' from ENV id 'abc' i.e.: abc_api1.1, abc_api1.2 ]

3). Display the NODEHOST of JVMNAME is entered as input: [input: jvm=abc_logihs1output should be NODEHOST of jvm 'abc_logihs1' i.e.: v.x.y.z]

4). Display all the jvmtype present on the NODEHOST entered as a input: [input: NODEHOST=v.x.y.z output should be: log_ihs, appihs, nodeagent, api]

5). Display the CELL of NODEHOST is entered as a input: [input: NODEHOST=v.x.y.z output should be: XYZ ]

6). Display the nodeid when NODEHOST is entered as a input: [input: NODEHOST=v.x.y.z output should be: 1 ]

7). Display the DMGR when JVMNAME is entered as input: [input: jvm=abc_api1.1output should be : a.b.c.d ]

8). Display the ENVDETAILS when JVMNAME is entered as input: [input: jvm=abc_api1.1output should be : abc ]

Please help. I am using XML::Smart to achieve few things, however I am learning perl.

I have only written code for 1st point.

use XML::Smart; use strict; use warnings; $ENVFILE="/appl/TOPOLOGY.xml"; my $ENVxml = XML::Smart->new($ENVFILE) || die ("Could not find details + file."); my $env = $ARGV[0]; if ( $ARGV[0] eq "fetch" ) { my @envname=$ENVxml->{OUTMOST}->{MYENVDETAILS}->{ENVDETAILS}(' +[@]','id'); foreach my $i (@envname) { print "$i\n"; } }

Replies are listed 'Best First'.
Re: parsing & retrieving from an XML file
by marto (Cardinal) on Nov 06, 2017 at 17:39 UTC

    I'd use Path::Tiny to read the file into a variable ($xml), then Mojo::DOM to process it, e.g.

    # no XML declaration, in example provided # so no auto detect fancyness, force xml mode: my $dom = Mojo::DOM->new->xml(1)->parse( $xml ); for my $e ( $dom->find('ENVDETAILS')->each ){ say $e->{id}; }

    Familiarise yourself with Mojo::DOM, use it to filter the data based upon your other requirements, post your attempts and how they failed. See also this example using selectors to get just the data required.

      Ty very much for the reply, I will try & post the results. Thanks :)
Re: parsing & retrieving from an XML file
by Discipulus (Canon) on Nov 06, 2017 at 19:50 UTC
    Hello sbafna and welcome to the monastery and to the wonderful world of Perl!

    I abandoned the use of XML::Smart long time ago. iirc the smart part ends in it's name..

    In a recent thread you can review all serious tools to parse XML using Perl. My homenode has a bunch of links about the matter.

    I normally use XML::Twig that has also a website with tutarials and docs.

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      Ty very much for the warm welcome!! :) Thanks for sharing, I will try them & post the results.
Re: parsing & retrieving from an XML file
by 1nickt (Canon) on Nov 06, 2017 at 17:22 UTC

    Hi, welcome, so does the code accomplish the 1st point successfully?

    Regarding the remaining points, what is the question?

    Regarding your choice of XML-handling module: I'm not familiar with XML::Smart but I see its last release was in 2013 and it has some unaddressed issues; this would lead me to be cautious and possibly consider a different module.


    The way forward always starts with a minimal test.
      Ty very much for the reply. Yes, first point does the needful. However, I know there is an easy way out there..just wanted to know. I have got few wonderful replies. I will try them & post the results.
Re: parsing & retrieving from an XML file
by tangent (Parson) on Nov 06, 2017 at 20:57 UTC
    Here is a way to extract all the necessary info from your XML using XML::LibXML and simple xpath queries. You will probably want to save the results to a data structure (an array of hashes should work) rather than printing them out like I do, but printing them out first allows you to see if it is working correctly.

    See XML::LibXML::Node and XML::LibXML::Element for more details.

    use XML::LibXML; my $doc = XML::LibXML->load_xml(location => '/path/to/file.xml'); my @envdetails = $doc->findnodes('//ENVDETAILS'); for my $envdetail (@envdetails) { my $id = $envdetail->getAttribute('id'); print "id: $id\n"; my $dmgrhost = $envdetail->findvalue('DMGRHOST'); print "DMGRHOST: $dmgrhost\n"; my @nodes = $envdetail->findnodes('NODE'); for my $node (@nodes) { my $nodeid = $node->getAttribute('nodeid'); print "nodeid: $nodeid\n"; my $nodehost = $node->findvalue('NODEHOST'); print "NODEHOST: $nodehost\n"; my @jvms = $node->findnodes('JVM'); for my $jvm (@jvms) { my $jvmtype = $jvm->getAttribute('jvmtype'); if ($jvmtype eq 'api') { my @jvmnames = $jvm->findnodes('JVMNAME'); for my $jvmname (@jvmnames) { my $name = $jvmname->textContent; print "JVMNAME: $name\n"; } } } } } # OUTPUT id: abc DMGRHOST: a.b.c.d nodeid: 1 NODEHOST: v.x.y.z JVMNAME: abc_api1.1 JVMNAME: abc_api1.2
      Thanks a lot for giving it a try!! Appreciate it. I will try & post the results. What I understood is that I need to create my own data structure to save & retrieve it.
Re: parsing & retrieving from an XML file
by sbafna (Novice) on Nov 08, 2017 at 09:03 UTC

    I have written the code using XML::Smart, as perl in my box is not an updated one. Also, I have tweaked the XML a little bit.

    XML:

    <OUTMOST> <MYENVDETAILS> <ENVDETAILS id="abc" > <DMGRHOST>a.b.c.d</DMGRHOST> <CELL>XYZ</CELL> <NODEDETAIL nodeid="1" NODEHOST="v.x.y.z" > <IHSNODE>v_ihsnode</IHSNODE> <IHSJVM ihsjvmtype="log_ihs" > <IHSJVMNAME>abc_logihs1</IHSJV +MNAME> </IHSJVM> <JVM jvmtype="api" > <JVMNAME>abc_api1.1</JVMNAME> <JVMNAME>abc_api1.2</JVMNAME> </JVM> </NODEDETAIL> </ENVDETAILS> </MYENVDETAILS> </OUTMOST>

    Execution

    ----- myaccount v.x.y.z /appl/ ----- $ ./my_topologyParser.pl fetch abc ----- myaccount v.x.y.z /appl/ ----- $ ./my_topologyParser.pl nodesall v.x.y.z ----- myaccount v.x.y.z /appl/ ----- $ ./my_topologyParser.pl abc nodewrtseg v.x.y.z ----- myaccount v.x.y.z /appl/ ----- $ ./my_topologyParser.pl abc jvmtype api abc_api1.1 abc_api1.2 ----- myaccount v.x.y.z /appl/ ----- $ ./my_topologyParser.pl abc jvmwrtnode v.x.y.z api abc_api1.1 abc_api1.2 ----- myaccount v.x.y.z /appl/ ----- $ ./my_topologyParser.pl abc nodewrtjvm abc_api1.2 v.x.y.z ----- myaccount v.x.y.z /appl/ -----

    Actual Code

    #!/appl/perl/bin/perl use XML::Smart; use strict; use warnings; $ENVFILE="/appl/test_MYENV_Top.xml"; my $ENVxml = XML::Smart->new($ENVFILE) || die ("Could not find file.") +; my $env = $ARGV[0]; if ( $ARGV[0] eq "fetch" ) { my @envname=$ENVxml->{OUTMOST}->{MYENVDETAILS}->{ENVDETAILS}(' +[@]','id'); foreach my $i (@envname) { print "$i\n"; } } elsif ($ARGV[0] eq "nodesall"){ my @envname=$ENVxml->{OUTMOST}->{MYENVDETAILS}->{ENVDETAILS}(' +[@]','id'); foreach my $i (@envname) { my @nodename=$ENVxml->{OUTMOST}->{MYENVDETAILS}->{ENVD +ETAILS}->('id','eq',"$i")->{NODEDETAIL}('[@]','NODEHOST'); foreach my $c (@nodename) { print "$c\n"; } } } elsif ($ARGV[1] eq "nodewrtseg"){ my @nodename=$ENVxml->{OUTMOST}->{MYENVDETAILS}->{ENVDETAILS}- +>('id','eq',"$env")->{NODEDETAIL}('[@]','NODEHOST'); foreach my $i (@nodename) { print "$i\n"; } } elsif ($ARGV[1] eq "jvmtype") { my $typeCheck=$ARGV[2]; my @jvmlist; my @nodename=$ENVxml->{OUTMOST}->{MYENVDETAILS}->{ENVDETAILS}- +>('id','eq',"$env")->{NODEDETAIL}('[@]','NODEHOST'); if ($#nodename < 0){ exit 1; } foreach my $i (@nodename) { my @names=@{$ENVxml->{OUTMOST}->{MYENVDETAILS}->{ENVDE +TAILS}->('id','eq',"$env")->{NODEDETAIL}->('NODEHOST','eq',"$i")->{JV +M}->('jvmtype','eq',"$typeCheck")->{JVMNAME}}; if ($#names > 0){ push(@jvmlist,@names); } } @jvmlist = grep /\S/, @jvmlist; if ($#jvmlist < 0){ print " No $typeCheck type jvm found on $env . Incorre +ct jvm type passed.\n"; exit 1; } foreach my $c (@jvmlist) { print "$c\n"; } } elsif ($ARGV[1] eq "jvmwrtnode") { my $nodeCheck=$ARGV[2]; my $typeCheck=$ARGV[3]; if (defined $nodeCheck && defined $typeCheck){ my @names=@{$ENVxml->{OUTMOST}->{MYENVDETAILS}->{ENVDE +TAILS}->('id','eq',"$env")->{NODEDETAIL}->('NODEHOST','eq',"$nodeChec +k")->{JVM}->('jvmtype','eq',"$typeCheck")->{JVMNAME}}; if ($#names < 0){ print " No $typeCheck type jvm found on $nodeC +heck node. Incorrect type/nodename passed.\n"; exit 1; } @names = grep /\S/, @names; foreach my $c (@names) { print "$c\n"; } } else { print " No Node type or nodename passed. Incorrect no. + of arguments passed.\n"; exit 1; } } elsif ($ARGV[1] eq "nodewrtjvm") { my $jvmCheck=$ARGV[2]; if (defined $jvmCheck){ my @nodename=$ENVxml->{OUTMOST}->{MYENVDETAILS}->{ENVD +ETAILS}->('id','eq',"$env")->{NODEDETAIL}('[@]','NODEHOST'); if ($#nodename < 0){ exit 1; } OUTER: foreach my $i (@nodename) { my @jvmtypes=$ENVxml->{OUTMOST}->{MYENVDETAILS +}->{ENVDETAILS}->('id','eq',"$env")->{NODEDETAIL}('NODEHOST','eq',"$i +")->{JVM}('[@]','jvmtype'); foreach my $k (@jvmtypes) { my @names = @{$ENVxml->{OUTMOST}->{MYE +NVDETAILS}->{ENVDETAILS}->('id','eq',"$env")->{NODEDETAIL}->('NODEHOS +T','eq',"$i")->{JVM}('jvmtype','eq',"$k")->{JVMNAME}}; if ($#names >= 0){ foreach (@names){ if ( $_ eq $jvmCheck ){ print "$i\n"; last OUTER; } } } } } } } else{ print "\nThis parser is argument specific."; print "\n1). $0 fetch : To list out environments"; print "\n2). $0 nodesall : To list all nodes of all environmen +ts."; print "\n3). $0 <segID> nodewrtseg : to list all nodes of spec +ific environment."; print "\n4). $0 <segID> jvmtype <typeofjvm> : to list all node +s having specified jvmtypes of the specific environment."; print "\n5). $0 <segID> jvmwrtnode <nodename> <typeofjvm> : to + list all jvms present on specific node of specific type of the envir +onment."; print "\n6). $0 <segID> nodewrtjvm <jvmname> : to list the nod +e of specific jvm of specified environment.\n"; }

    I know I could have used getopts, however, I was in a hurry :)