Macslayer has asked for the wisdom of the Perl Monks concerning the following question:

I've been having some difficulties parsing XML output from the Sun Grid Engine's `qstat` command. Currently, I'm using XML::Smart. However, after creating the XML object and parsing the output (the command outputs a LOT of XML), it will not get all of the data that I want. Here's an example: I run the command qstat -u \* -ext -xml -s r -F:
<?xml version='1.0'?> <job_info ... ?revision=1.11"> <queue_info> <Queue-List> <name>long_zeta@zeta09.local</name> <qtype>BIP</qtype> <slots_used>1</slots_used> <slots_resv>0</slots_resv> <slots_total>8</slots_total> <arch>lx26-amd64</arch> <resource name="arch" type="hl">lx26-amd64</resource> <resource name="num_proc" type="hl">8</resource> ... <job_list state="running"> <JB_job_number>642412</JB_job_number> <JAT_prio>5.00248</JAT_prio> <JAT_ntix>0.00002</JAT_ntix> <JB_name>U6s53097</JB_name> <JB_owner>npatel37</JB_owner> <JB_project>correlat</JB_project> <JB_department>defaultdepartment</JB_department> <state>r</state> <cpu_usage>224919.00000</cpu_usage> <mem_usage>15753.53518</mem_usage> <io_usage>0.32690</io_usage> <tickets>0</tickets> <JB_override_tickets>0</JB_override_tickets> <JB_jobshare>0</JB_jobshare> <otickets>0</otickets> <ftickets>0</ftickets> <stickets>0</stickets> <JAT_share>0.00002</JAT_share> <slots>1</slots> </job_list> </Queue-List> </queue_info> </job_info>
The problem is, I can't get any info under the <job_list> tag, such as <slots>. No matter what I do, the Perl script will return nothing. Here's the code I'm using:
use XML::Smart; my ($xml,$qstat); $qstat=`qstat -u \* -ext -xml -s r -F`; $xml = XML::Smart->new($qstat); foreach ($xml->{job_info}->{queue_info}->{"Queue-List"}('@') ) { print $_->{job_list}->{slots}; }
That script won't print out anything. I've used Data::Dumper and Data::Dump; both printed out the tree exactly as expected, without the <job_list> tag anywhere to be found. I also should point out that, in between the <queue-info> tags, there are many many <Queue-List> tags that may or may not contain a <job_list>. Some have two or three, others have none. However, even Data::Dumper cannot find those, even when I grep for them. Thanks, Ben Olson

Replies are listed 'Best First'.
Re: Parsing XML output from `qstat`
by kcott (Archbishop) on May 05, 2014 at 08:12 UTC

    G'day Ben,

    Welcome to the monastery.

    Short answer: I'm unable to reproduce your problem.

    Longer answer:

    I set up $qstat as a here-doc. I changed '... ?revision=1.11"' to 'revision="1.11"' and removed the other elision. Check that your original data (prior to replacing parts with '...') was actually valid XML.

    Other than that, I used your code as written. The output was a lone "1" character as was expected from "<slots>1</slots>".

    You actually don't need all those '->'s. I tried with the following code and got the same result.

    for ($xml->{job_info}{queue_info}{'Queue-List'}('@')) { print $_->{job_list}{slots}; }

    I'm not a user of XML::Smart. I had to install this module, so I used the latest 1.79 version. Check that you're also using this or upgrade. Also check for any problems with your OS or Perl version (I'm using 5.18.1 on darwin-thread-multi-2level).

    Here's the actual code and data I used for my tests (in the spoiler):

    -- Ken

      Thanks for the code! I've fixed it, which I describe in a post later on, finally using XML::Smart correctly.
Re: Parsing XML output from `qstat`
by Anonymous Monk on May 05, 2014 at 08:46 UTC

    ... XML::Smart ...

    I wouldn't :)

    use XML::LibXML 1.70; ## for load_html/load_xml/location my $qstat = 'qstat.xml'; my $dom = XML::LibXML->new(qw/ recover 2 /)->load_xml( location => $qs +tat ); for my $job ( $dom->findnodes( q{ /job_info/queue_info/Queue-List + } ) ){ print $job->nodePath, "\n"; for my $slot ( $job->findnodes( q{ ./job_list/slots } ) ){ print $slot->nodePath, "\n"; print $slot, "\n"; print $slot->textContent, "\n"; } } __END__ /job_info/queue_info/Queue-List /job_info/queue_info/Queue-List/job_list/slots <slots>1</slots> 1

    Re: parsing xml, xpather.pl/htmltreexpather.pl

    Also , if you're going to post xml, make sure its well-formed first

      I've fixed the problem, and it didn't have to do with XML::Smart. After running your code and it working for the entire output of qstat, I investigated further as to why. LibXML requires the user to have the XML in a file before it can parse it, so I switched the $qstat variable in my original example to reading the output from a file. Once I did that, XML::Smart worked perfectly. Thanks for the code, though; I may have to switch to LibXML! It does get a bit messy compared to XML::Smart, though.

        so I switched the $qstat variable in my original example to reading the output from a file. Once I did that, XML::Smart worked perfectly

        Now that is a sign of quality :) like kcott I cannot reproduce that ... string or filename XML::Smart spits out 1

        It does get a bit messy compared to XML::Smart, though.

        What does that mean?

        DOM is the standard , used across all programming languages everywhere...

        If I'm chosing nonstandard I'm going with XML::Twig or XML::Rules