comment on

I need a really simple way to parse a tiny XML document, but I'm having trouble finding the proper solution.

I'm using XML::LibXML and I can easily pull out the doctype, charset and other data I need, but I can't figure out how to grab a list of the errors located in the errorlist tag.

Here's an example of the XML I need to parse, I can't change this.

<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<env:Body>
<m:markupvalidationresponse env:encodingStyle="http://www.w3.org/2003/
+05/soap-encoding" xmlns:m="http://www.w3.org/2005/10/markup-validator
+">
    <m:uri>http://www.perlmonks.org/</m:uri>
    <m:checkedby>http://localhost/w3c-markup-validator/</m:checkedby>
    <m:doctype>-//W3C//DTD HTML 4.0 Transitional//EN</m:doctype>
    <m:charset>utf-8</m:charset>
    <m:validity>false</m:validity>
    <m:errors>
        <m:errorcount>59</m:errorcount>
        <m:errorlist>
            <m:error>
                <m:line>11</m:line>
                <m:col>66</m:col>                                     
+      
                <m:message>document type does not allow element &quot;
+LINK&quot; here</m:message>
            </m:error>
           
            <m:error>
                <m:line>14</m:line>
                <m:col>41</m:col>                                     
+      
                <m:message>document type does not allow element &quot;
+LINK&quot; here</m:message>
            </m:error>
           
            <m:error>
                <m:line>21</m:line>
                <m:col>4</m:col>                                      
+     
                <m:message>document type does not allow element &quot;
+META&quot; here</m:message>
            </m:error>
        </m:errorlist>
    </m:errors>
    <m:warnings>
        <m:warningcount>0</m:warningcount>
        <m:warninglist>
     <m:warning><m:message>No Character Encoding Found!
    
      Falling back to 
    
    UTF-8.
  </m:message></m:warning>
        </m:warninglist>
    </m:warnings>
</m:markupvalidationresponse>
</env:Body>
</env:Envelope>
[download]

Here's a snippet of the code I'm using to read the XML.

    my $ua = LWP::UserAgent->new();
    my $response = $ua->request($request);
    my $parser = XML::LibXML->new();
    my $doc = $parser->parse_string($response->content);

    Kube::Demonize::logmsg($response->content);
    #for (my $i = 0; $i < @errorlist; $i++) {
    #    Kube::Demonize::logmsg(sprintf("%s\n", $errorlist[$i]->getEle
+mentsByTagName('m:line')->textContent));
    #}
    
    foreach my $d ($doc->getElementsByTagName('m:doctype')) {
        print $d->textContent;
    }
    
    foreach my $d ($doc->getElementsByTagName('m:validity')) {
        print $d->textContent;
    }
    
    foreach my $d ($doc->getElementsByTagName('m:charset')) {
        print $d->textContent;
    }
[download]

Don't get bogged down in the implementation, this is just a prototype to illustrate what I'm trying to do.

I'm just looking for the easiest way to pull out the errors, stuff them in a Perl data structure, like an array of hashes.

Something like this:

@errors[0] = %error(
       line=>120,
       col=>2,
       message=>'Tag not allowed');
@errors[1] = %error(
       line=>220,
       col=>3,
       message=>'Another error?');
[download]

Thanks.

In reply to Easiest way to parse a simple XML file? by halfbaked

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.