Extracting specific childnodes

madbee has asked for the wisdom of the Perl Monks concerning the following question:

Hello! I'm trying to parse an XML for which I have to find an extract only specific childnodes. The XML file is below:

<Aritcle>
  <Main>
   <Sect>
     <H4>Include</H4>
      .....
      <P1> This is the criteria</P1>
      <L>
        <LI>
          <LI_Label>1.</LI_Label>
          <LI_Title>Critera 1</LI_Title>
         </LI>
         <LI>
          <LI_Label>2.</LI_Label>
          <LI_Title>Critera 2</LI_Title>
         </LI>
          <LI>
          <LI_Label>3.</LI_Label>
          <LI_Title>Critera 3</LI_Title>
         </LI>
          <LI>
          <LI_Label>4.</LI_Label>
          <LI_Title>Critera 3</LI_Title>
         </LI>
        </Sect>
     </Main>
   </Article>
[download]

From the above XML file, I need to extract only the LI_Title nodes where Header = "Include". There can be 1 or many LI_Titles in an xml file and many such sections

I've come up with the following to identify the nodes.But I'm not sure how I can just pull out the LI_Title nodes.So, hoping for some help here.

$dom = $parser->parse_file($file);
my $expr = ('//Article//Main//Sect//H3[contains(.,"Include")]|//Articl
+e//Main//Sect//H4[contains(.,"Include")]|//Article//Main[contains(.,"
+Include")]//*[name()="LI"]');

my @nodes=$dom->findnodes($expr);

foreach $nod(@nodes) {
        print "element: ".$nod->nextNonBlankSibling();    
    }
[download]

Thanks in advance.-madbee

Comment on Extracting specific childnodes Select or Download Code

Replies are listed 'Best First'.
Re: Extracting specific childnodes (xpath whitespace) by Anonymous Monk on Jul 06, 2013 at 03:19 UTC
Thanks in advance.- Hi madbee r:) I already gave this answer in Re: Counting number of child nodes based on element value (typos)/Re: Having trouble with siblings/Re: Get Node Value from irregular XML (xpather.pl) but here you go `my $xpath_Include_LI_Title = q{ //Article/Main/Sect/*[ ( name()="H3" or name()="H4" ) and contains(.,"Include" ) ]//LI_Title };` [download] If you only want the LI_Title instead of the LI, just copy/paste that tagname :) xpath is quite fond of whitespace, so merely because cmd.exe doesn't seem to like it, no reason for you to avoid it	[reply] [d/l]
Re^2: Extracting specific childnodes (xpath whitespace) by madbee (Acolyte) on Jul 06, 2013 at 04:13 UTC
Hello @Anonymous Monk! Thanks for responding. Tried your approach. It didn't return anything. `@nodes=$dom->findnodes($xpath_Include_LI_Title); print "{@nodes}\n";` [download] the nodes array should contain the values of LI_Title elements, correct? Thanks again!	[reply] [d/l]
Re^3: Extracting specific childnodes (xpath whitespace) by Anonymous Monk on Jul 06, 2013 at 11:57 UTC
Well, I made a typo of omission . I didn't make the same typo one of the linked nodes. Can you figure out what it is? What is the first step to figure it out? Can you explain in english the xpath I provided in untypo'd node? :)	[reply]
Re^3: Extracting specific childnodes (play xmllint --shell ) by Anonymous Monk on Jul 06, 2013 at 22:42 UTC
Why no response to Re^3: Extracting specific childnodes (xpath whitespace)? If you're game, I'm game ; I'll play if you'll play; Using the previous answer Read more... (1289 Bytes) Debugging /diagnosing and fixing this latest typo-ed answer Read more... (2 kB) Using the power of parens in xpath / parentheses in xpath Read more... (1275 Bytes) You can even give node sets to xpath string functions like contains() so you don't have to go to parent(..) then find descendents(//) Read more... (985 Bytes)	[reply] [d/l] [select]
Re^4: Extracting specific childnodes (play xmllint --shell ) by madbee (Acolyte) on Jul 07, 2013 at 04:50 UTC