Easiest way to parse a simple XML file?

halfbaked has asked for the wisdom of the Perl Monks concerning the following question:

I need a really simple way to parse a tiny XML document, but I'm having trouble finding the proper solution.

I'm using XML::LibXML and I can easily pull out the doctype, charset and other data I need, but I can't figure out how to grab a list of the errors located in the errorlist tag.

Here's an example of the XML I need to parse, I can't change this.

<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<env:Body>
<m:markupvalidationresponse env:encodingStyle="http://www.w3.org/2003/
+05/soap-encoding" xmlns:m="http://www.w3.org/2005/10/markup-validator
+">
    <m:uri>http://www.perlmonks.org/</m:uri>
    <m:checkedby>http://localhost/w3c-markup-validator/</m:checkedby>
    <m:doctype>-//W3C//DTD HTML 4.0 Transitional//EN</m:doctype>
    <m:charset>utf-8</m:charset>
    <m:validity>false</m:validity>
    <m:errors>
        <m:errorcount>59</m:errorcount>
        <m:errorlist>
            <m:error>
                <m:line>11</m:line>
                <m:col>66</m:col>                                     
+      
                <m:message>document type does not allow element &quot;
+LINK&quot; here</m:message>
            </m:error>
           
            <m:error>
                <m:line>14</m:line>
                <m:col>41</m:col>                                     
+      
                <m:message>document type does not allow element &quot;
+LINK&quot; here</m:message>
            </m:error>
           
            <m:error>
                <m:line>21</m:line>
                <m:col>4</m:col>                                      
+     
                <m:message>document type does not allow element &quot;
+META&quot; here</m:message>
            </m:error>
        </m:errorlist>
    </m:errors>
    <m:warnings>
        <m:warningcount>0</m:warningcount>
        <m:warninglist>
     <m:warning><m:message>No Character Encoding Found!
    
      Falling back to 
    
    UTF-8.
  </m:message></m:warning>
        </m:warninglist>
    </m:warnings>
</m:markupvalidationresponse>
</env:Body>
</env:Envelope>
[download]

Here's a snippet of the code I'm using to read the XML.

    my $ua = LWP::UserAgent->new();
    my $response = $ua->request($request);
    my $parser = XML::LibXML->new();
    my $doc = $parser->parse_string($response->content);

    Kube::Demonize::logmsg($response->content);
    #for (my $i = 0; $i < @errorlist; $i++) {
    #    Kube::Demonize::logmsg(sprintf("%s\n", $errorlist[$i]->getEle
+mentsByTagName('m:line')->textContent));
    #}
    
    foreach my $d ($doc->getElementsByTagName('m:doctype')) {
        print $d->textContent;
    }
    
    foreach my $d ($doc->getElementsByTagName('m:validity')) {
        print $d->textContent;
    }
    
    foreach my $d ($doc->getElementsByTagName('m:charset')) {
        print $d->textContent;
    }
[download]

Don't get bogged down in the implementation, this is just a prototype to illustrate what I'm trying to do.

I'm just looking for the easiest way to pull out the errors, stuff them in a Perl data structure, like an array of hashes.

Something like this:

@errors[0] = %error(
       line=>120,
       col=>2,
       message=>'Tag not allowed');
@errors[1] = %error(
       line=>220,
       col=>3,
       message=>'Another error?');
[download]

Thanks.

Comment on Easiest way to parse a simple XML file? Select or Download Code

Replies are listed 'Best First'.

Re: Easiest way to parse a simple XML file?
by Tanktalus (Canon) on Dec 11, 2008 at 00:31 UTC