comment on

Hello! I have an XML document which is below. For this XML, I need to extract the node value based on the keyword. i.e Based on the keyword "design", I need to extract the entire string between header nodes.

    <root>
        <part>
            <sect>
               <header>
                   This is a design XZY document for Project
                </header>
            </sect>
         </part>
     </root>

For this, I have the below Perl script:
my $dom = XML::LibXML->new->parse_file($file);

my $nodeset = $dom->find('/root/part/sect/header');

foreach my $node($nodeset -> get_nodelist)
{
        $node -> string_value;

        if ($node =~ m/design/i)
        {
          my $design= $node;
        print $design;
        }
}
[download]

The problem is, I need to do this across multiple xmls for which I noticed that the string I am looking for is in another part of the doc. example: it is under:

 
    <root>
      <para>This is a design XZY document for Project</para>
      <part>
         <sect>
           <header>
               This is some header
            </header>
         </sect>
       </part>
     <root>
[download]

The value occuring under root/para tags is an anamoly but valid which I have to accomodate for. Given such irregular xmls, is there a way I can incorporate these 2 scenarios using one generic code? Ofcourse, a much devious roundabout way would be to first check the valid node and if not found then go back to under root. But I was wondering if there is a simpler way to do this and was hoping for some help here.

Thanks in advance for your time and apologies if the question is not clear enough.

Regards, Madbee

In reply to Get Node Value from irregular XML by madbee

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.