in reply to Re: XML data extraction
in thread XML data extraction

  1. Using XML::LibXML and XPath means you have to load the whole document into memory as a huge maze of interconnected objects. Good luck doing that with a file that actually is big. Not that it would not be a huge waste of resources even if you are able to fit it in memory.
  2. XPath is just another language to write a (part of a) program with. As soon as you are writing XPath, you are programming so the blurb about not having to write a program is nonsense. Yeah, you do not write it in Perl and use instead XPath combined with whatever expression and scripting language your spreadsheet provides. Big difference.

Jenda
Enoch was right!
Enjoy the last years of Rome.

Replies are listed 'Best First'.
Re^3: XML data extraction
by choroba (Cardinal) on Oct 12, 2017 at 15:40 UTC
    > you have to load the whole document into memory

    That's not true. You can use XML::LibXML::Reader which is a pull parser, kind of like a SAX parser with the whole power of XML::LibXML available on request.

    > huge

    The OP mentions "1000 of records". That doesn't sound really huge to today's standards.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      Very few things lead to a messier code than pull parsers, but feel free to go down that route.

      Even with just 1000s of records it's a still insane waste of memory and CPU. Now if it's something that runs on your PC, fine. If you waste the resources of servers this way, good luck to you.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

        I don't see how this is much messier than a XML::Twig or XML::LibXML solution.
        #!/usr/bin/perl use warnings; use strict; use XML::LibXML::Reader; use Data::Dumper; sub process_record { my ($node, $records) = @_; my $name = $node->findvalue('@name'); my $citype = $node->findvalue('@ciType'); my $status = $node->findvalue('dimension/@status'); my $time = $node->findvalue('normalize-space(dimension/body/entr +y[@key="Last Status Change"])'); push @$records, { name => $name, citype => $citype, status => $status, Time => $time }; } my $bamxml = 'file.xml'; my @records; my $pattern = 'XML::LibXML::Pattern' ->new('/nodes/node/children/node/children/node'); my $reader = 'XML::LibXML::Reader'->new(location => $bamxml); while ($reader->nextPatternMatch($pattern) and $reader->nodeType == XML_READER_TYPE_ELEMENT) { next unless $reader->getAttribute('ciType') eq 'application'; my $node = $reader->copyCurrentNode(1); process_record($node, \@records); } print Dumper \@records;
        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,