comment on

Hello! Still trying to parse the same XML as before - but now having trouble with siblings. I have 2 versions of the xml file. Need to come up with a generic script to extract the values in either cases. Pasting the XML first and then my attempt.

When I run my script, it doesnt return any siblings. just 1.

XML File: Version 1
<root>
        <part>
            <sect>
               <header>
                   1. Purpose and rationale
                   <P>purpose 1</p>
                   <p>Purpose 2</p>
                   <p>purpose 3</p>
                   <l>
                      <li>purpose list 1</li>
                      <li>list2</li>
                    </l>                      
                </header>
            </sect>
         </part>
     </root>
[download]

Objective:In this scenario, I need to search for Purpose and rationale. If found, need to extract all siblings between <header> and </header>

The same section Purpose and rationale can exist in another xml but in different format. The challenge is to have 1 generic script to handle both scenarios.

XML Scenario 2:

 
 <root>
        <part>
            <sect>
               <header>
                   1. Purpose and rationale
                   <P>purpose 1</p>
                   <p>Purpose 2</p>
                   <p>purpose 3</p>
                   <l>
                      <li>purpose list 1</li>
                      <li>list2</li>
                    </l>                      
                  <p>2. Some other heading</p>
                  <p>content 1</p>
                  <p>content 2</p>
                </header>
            </sect>
         </part>
     </root>
[download]

For scenario 2, I need to extract all the siblings under Purpose and rationale only until "Some other heading". This heading title can change. The only identifier is that the node begins with a number.

In this xml content from 2 headers is mixed up. so i only need to extract the content from the siblings of the "purpose and rationale" section

My attempted code is below:

     my $dom = XML::LibXML->new->parse_file($file);
     my $study_str = 'Purpose and rationale|Study purpose|Study ration
+ale'
    for my $search ('/root/part/sect/header') {
        my $nodeset = $dom->find($search);
    foreach my $node($nodeset -> get_nodelist)
    {
        $node -> string_value;

        if ($node =~ m/$study_str/i)
        {
          my $protocol = $node;
          print $protocol,"\n";
         #go to the next sibling
         while ($node -> { Node }) {
        if ($node -> { Node } -> getNextSibling ) {
         $node -> { Node } = $node -> getNextSibling;
        return $node -> { Node };
        }
        }
        }
}
}
[download]

This only returns the value within the header tags and none of the children. obviously,i'm doing something wrong.Hoping for some help here to extract the content I need.

Thanks again for your help and apologies if the question is not clear

Regards, Madbee

In reply to Having trouble with siblings by madbee

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.