I need a really simple way to parse a tiny XML document, but I'm having trouble finding the proper solution.

I'm using XML::LibXML and I can easily pull out the doctype, charset and other data I need, but I can't figure out how to grab a list of the errors located in the errorlist tag.

Here's an example of the XML I need to parse, I can't change this.
<?xml version="1.0" encoding="UTF-8"?> <env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"> <env:Body> <m:markupvalidationresponse env:encodingStyle="http://www.w3.org/2003/ +05/soap-encoding" xmlns:m="http://www.w3.org/2005/10/markup-validator +"> <m:uri>http://www.perlmonks.org/</m:uri> <m:checkedby>http://localhost/w3c-markup-validator/</m:checkedby> <m:doctype>-//W3C//DTD HTML 4.0 Transitional//EN</m:doctype> <m:charset>utf-8</m:charset> <m:validity>false</m:validity> <m:errors> <m:errorcount>59</m:errorcount> <m:errorlist> <m:error> <m:line>11</m:line> <m:col>66</m:col> + <m:message>document type does not allow element &quot; +LINK&quot; here</m:message> </m:error> <m:error> <m:line>14</m:line> <m:col>41</m:col> + <m:message>document type does not allow element &quot; +LINK&quot; here</m:message> </m:error> <m:error> <m:line>21</m:line> <m:col>4</m:col> + <m:message>document type does not allow element &quot; +META&quot; here</m:message> </m:error> </m:errorlist> </m:errors> <m:warnings> <m:warningcount>0</m:warningcount> <m:warninglist> <m:warning><m:message>No Character Encoding Found! Falling back to UTF-8. </m:message></m:warning> </m:warninglist> </m:warnings> </m:markupvalidationresponse> </env:Body> </env:Envelope>


Here's a snippet of the code I'm using to read the XML.
my $ua = LWP::UserAgent->new(); my $response = $ua->request($request); my $parser = XML::LibXML->new(); my $doc = $parser->parse_string($response->content); Kube::Demonize::logmsg($response->content); #for (my $i = 0; $i < @errorlist; $i++) { # Kube::Demonize::logmsg(sprintf("%s\n", $errorlist[$i]->getEle +mentsByTagName('m:line')->textContent)); #} foreach my $d ($doc->getElementsByTagName('m:doctype')) { print $d->textContent; } foreach my $d ($doc->getElementsByTagName('m:validity')) { print $d->textContent; } foreach my $d ($doc->getElementsByTagName('m:charset')) { print $d->textContent; }


Don't get bogged down in the implementation, this is just a prototype to illustrate what I'm trying to do.

I'm just looking for the easiest way to pull out the errors, stuff them in a Perl data structure, like an array of hashes.

Something like this:
@errors[0] = %error( line=>120, col=>2, message=>'Tag not allowed'); @errors[1] = %error( line=>220, col=>3, message=>'Another error?');


Thanks.

In reply to Easiest way to parse a simple XML file? by halfbaked

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.