in reply to Re^2: XML data extraction
in thread XML data extraction

> you have to load the whole document into memory

That's not true. You can use XML::LibXML::Reader which is a pull parser, kind of like a SAX parser with the whole power of XML::LibXML available on request.

> huge

The OP mentions "1000 of records". That doesn't sound really huge to today's standards.

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

Replies are listed 'Best First'.
Re^4: XML data extraction
by Jenda (Abbot) on Oct 17, 2017 at 12:49 UTC

    Very few things lead to a messier code than pull parsers, but feel free to go down that route.

    Even with just 1000s of records it's a still insane waste of memory and CPU. Now if it's something that runs on your PC, fine. If you waste the resources of servers this way, good luck to you.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

      I don't see how this is much messier than a XML::Twig or XML::LibXML solution.
      #!/usr/bin/perl use warnings; use strict; use XML::LibXML::Reader; use Data::Dumper; sub process_record { my ($node, $records) = @_; my $name = $node->findvalue('@name'); my $citype = $node->findvalue('@ciType'); my $status = $node->findvalue('dimension/@status'); my $time = $node->findvalue('normalize-space(dimension/body/entr +y[@key="Last Status Change"])'); push @$records, { name => $name, citype => $citype, status => $status, Time => $time }; } my $bamxml = 'file.xml'; my @records; my $pattern = 'XML::LibXML::Pattern' ->new('/nodes/node/children/node/children/node'); my $reader = 'XML::LibXML::Reader'->new(location => $bamxml); while ($reader->nextPatternMatch($pattern) and $reader->nodeType == XML_READER_TYPE_ELEMENT) { next unless $reader->getAttribute('ciType') eq 'application'; my $node = $reader->copyCurrentNode(1); process_record($node, \@records); } print Dumper \@records;
      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,