comment on

I have an problem I could sure use some help with. First, be gentle. I am new to both perl and LibXML. I have been parsing a document and placing elements into an array that is then written to a speadsheet column. Durring testing it was discovered that some nodes have more than one child node of the same name. I need to combine the text from each of these child nodes into one element of the array. The format of the xml is:

<Group id="V-3021">
    <title>blah blah blah</title>
    <description>blah blah blah</description>
    <Rule id="SV-41507r1_rule" severity="medium" weight="10.0">
        <version>blah blah blah</version>
        <title>blah blah blah</title>
        <description>blah blah blah</description>
        <reference>
            <dc:title>blah blah blah</dc:title>
            <dc:publisher>blah blah blahO</dc:publisher>
            <dc:type>blah blah blah</dc:type>
            <dc:subject>blah blah blah</dc:subject>
            <dc:identifier>blah blah blah</dc:identifier>
        </reference>
        <fixtext fixref="F-3046r3_fix">blah blah blah</fixtext>
        <check system="C-39986r2_chk">
            <check-content-ref name="M" href="VMS_XCCDF_Benchmark_Netw
+ork - Firewall - Cisco.xml"/>
            <check-content>This is the text I want</check-content>
        </check>
    </Rule>
</Group>
[download]

But occasionally it is like this:


<Group id="V-3021">
    <title>blah blah blah</title>
    <description>blah blah blah</description>
    <Rule id="SV-41507r1_rule" severity="medium" weight="10.0">
        <version>blah blah blah</version>
        <title>blah blah blah</title>
        <description>blah blah blah</description>
        <reference>
            <dc:title>blah blah blah</dc:title>
            <dc:publisher>blah blah blahO</dc:publisher>
            <dc:type>blah blah blah</dc:type>
            <dc:subject>blah blah blah</dc:subject>
            <dc:identifier>blah blah blah</dc:identifier>
        </reference>
        <fixtext fixref="F-3046r3_fix">blah blah blah</fixtext>
        <check system="C-39986r2_chk">
            <check-content-ref name="M" href="VMS_XCCDF_Benchmark_Netw
+ork - Firewall - Cisco.xml"/>
            <check-content>This is the text I want</check-content>
            <check-content>This is more text that I wantto grab and ad
+d to the end of the above text</check-content>
        </check>
    </Rule>
</Group>
[download]

I can pull all the text from "check-contents", but if there is more than one it throws off the row of data in the spreadsheet. I need to be able to say something like: If there are 2 or more <check-content> join the data an push into the array. If not, just push the data into the array. Now here is where the rub comes in. I am trying to pull everything below "Rule" and then pull the "check-contents" from each of those sections of XML. By doing this I should be able to join the two "check-content" section together before pushing the data into an array. The problem is that there is a namespace declared under the "reference" node (dc:). I have tried registering this namespace with no luck. I actually don't care about that section of data at all, but when I try and pull this section i get an error message that states ":1: namespace error : Namespace prefix dc on title is not defined s>ECAT-1, ECAT-2, ECSC-1</IAControls></description><reference><dc:title" If I could somehow instruct LibXML to pull everything below "Rule" regardless of what namespace is defined, that would be great. My latest attempt at this looks like this:

 my $parser = XML::LibXML->new() or die $!;
 my $doc1 = $parser->parse_file($filename1);
 my $xc1 = XML::LibXML::XPathContext->new($doc1->documentElement() );
 $xc1->registerNs(x => 'http://checklists.nist.gov/xccdf/1.1');
 $xc1->registerNs(dc => 'http://purl.org/dc/elements/1.1');


 for $Check ( $xc1->findnodes('//x:Rule') ) { 

     my $doc2 = $parser->parse_string($Check);
     my $xc2 = XML::LibXML::XPathContext->new($doc2->documentElement()
+ );
     $xc2->registerNs(x => 'http://checklists.nist.gov/xccdf/1.1');


     foreach $Check_Content ( $xc2->findvalue('check-content') ) { 

          push (@Check_Content1, $Check_Content);

          }

     @Check_Content1 = ();           

     $result_string = $Check_Content1[0] . $Check_Content1[1];
     push (@Check_Content, $result_string);
     }
 }
[download]

In reply to LibXML Namespace issue by ohm.kazhbu

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.