in reply to XML data extraction
Whenever I hear "big XML file" I think XML::Twig, as this can efficiently process the XML file record by record without loading the whole thing into memory. The following gives you the desired output. As for your example code, I don't think you can mix XML::XPath with XML::LibXML - I think it'd be best if you use only tried to use the operations provided by XML::XPath.
use warnings; use strict; use XML::Twig; use Data::Dumper; my $file = 'OpenApi.xml'; my @records; XML::Twig->new( twig_roots => { '/nodes/node/children/node/children/node' => sub { my ($t, $elt) = @_; my $dim = $elt->first_child('dimension'); push @records, { name => $elt->att('name'), citype => $elt->att('ciType'), status => $dim->att('status'), Time => $dim->first_child('body') ->first_child('entry[@key="Last Status Change"]') ->text }; $t->purge; }, }, )->parsefile($file); print Dumper(\@records);
Update: As for your code, it's just a matter of getting the XPath expression right, this also gives the desired output:
use strict; use warnings; use XML::XPath; use Data::Dumper; my $bamxml = 'OpenApi.xml'; my $bamxp = XML::XPath->new(filename => $bamxml); my $bamxpath = $bamxp->findnodes('//nodes/node/children/node/children +/node'); my @records; foreach my $bamnode ($bamxpath->get_nodelist) { my $name = $bamxp->find('./@name',$bamnode)->string_value; my $citype = $bamxp->find('./@ciType',$bamnode)->string_value; my $status = $bamxp->find('./dimension/@status',$bamnode)->string_ +value; my $time = $bamxp->find('./dimension/body/entry[@key="Last Status +Change"]',$bamnode)->string_value; s/^\s+|\s+$//g for $name,$citype,$status,$time; push @records, { name => $name, citype => $citype, status => $status, Time => $time }; } print Dumper(\@records);
Update 2: Oops, missed your requirement "want to read node only where ciType='application'". The same XPath that choroba showed works in my code samples: '/nodes/node/children/node/children/node[@ciType="application"]'
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: XML data extraction (updated x2)
by snehit.ar (Beadle) on Oct 12, 2017 at 06:47 UTC | |
by haukex (Archbishop) on Oct 12, 2017 at 07:47 UTC | |
by snehit.ar (Beadle) on Oct 12, 2017 at 09:11 UTC | |
by hippo (Archbishop) on Oct 12, 2017 at 10:34 UTC | |
by haukex (Archbishop) on Oct 12, 2017 at 10:35 UTC |