in reply to Get Node Value from irregular XML
Sure thing. You just need to know which paths to check, and then stop checking when you find the desired string. Since you're looking for a string with the word "design", we can do it like this:
#!/usr/bin/perl use strict; use warnings; use autodie; use XML::LibXML; use Data::Dumper; my @Docs = ( <<EOXML, <root> <part> <sect> <header> This is a design XZY document for Project </header> </sect> </part> </root> EOXML <<EOXML <root> <para>This is a design XZY document for Project</para> <part> <sect> <header> This is some header </header> </sect> </part> </root> EOXML ); for (my $idx=0; $idx<@Docs; ++$idx) { my $XML = $Docs[$idx]; print "----------- SEARCHING DOCUMENT $idx ---------\n"; my $dom = XML::LibXML->load_xml( string=> $XML ); DOCSEARCH: for my $search ('/root/part/sect/header', '/root/para') { print "----- searching: $search\n"; my $nodeset = $dom->find($search); foreach my $node($nodeset -> get_nodelist) { $node -> string_value; if ($node =~ m/design/i) { my $design= $node; print $design, "\n"; last DOCSEARCH; } } } }
Here, the outer loop is for each XML document, the middle loop iterates over the different possible search paths, and the inner loop digs out the particular chunk in question. We labelled the middle loop DOCSEARCH, so when we finally find the item, we can use last DOCSEARCH; to jump to the end of the middle loop and advance to the next document.
When I run it, I get:
$ perl 1041480.pl ----------- SEARCHING DOCUMENT 0 --------- ----- searching: /root/part/sect/header <header> This is a design XZY document for Project </header> ----------- SEARCHING DOCUMENT 1 --------- ----- searching: /root/part/sect/header ----- searching: /root/para <para>This is a design XZY document for Project</para>
Update: Added a "\n" to the print line to clean up the output a little.
When your only tool is a hammer, all problems look like your thumb.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Get Node Value from irregular XML
by madbee (Acolyte) on Jun 29, 2013 at 18:19 UTC | |
by roboticus (Chancellor) on Jun 29, 2013 at 19:25 UTC |