in reply to XML data extraction

Insofar as possible, do not write Perl code that must match the structure of an XML construct: use XPath for its intended purpose. XML::LibXML includes complete XPath support, thanks to the libxml2 binary library which is an industry standard used by many, many toolsets. Even if you cannot construct an XPath expression that exactly matches what you are looking for (or if you simply do not want to take the time to try ...), XPath can certainly hand you a simple list through which your Perl code can now simply iterate.

(Also bear in mind that most spreadsheet(!) tools also know about XML and XPath, such that sometimes you can avoid the actual business need for "a custom (Perl or otherwise) program" ... altogether. The very best program is the one that you actually didn't have to write, and this is often the case with XML.)

Replies are listed 'Best First'.
Re^2: XML data extraction
by Jenda (Abbot) on Oct 12, 2017 at 13:26 UTC
    1. Using XML::LibXML and XPath means you have to load the whole document into memory as a huge maze of interconnected objects. Good luck doing that with a file that actually is big. Not that it would not be a huge waste of resources even if you are able to fit it in memory.
    2. XPath is just another language to write a (part of a) program with. As soon as you are writing XPath, you are programming so the blurb about not having to write a program is nonsense. Yeah, you do not write it in Perl and use instead XPath combined with whatever expression and scripting language your spreadsheet provides. Big difference.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

      > you have to load the whole document into memory

      That's not true. You can use XML::LibXML::Reader which is a pull parser, kind of like a SAX parser with the whole power of XML::LibXML available on request.

      > huge

      The OP mentions "1000 of records". That doesn't sound really huge to today's standards.

      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

        Very few things lead to a messier code than pull parsers, but feel free to go down that route.

        Even with just 1000s of records it's a still insane waste of memory and CPU. Now if it's something that runs on your PC, fine. If you waste the resources of servers this way, good luck to you.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.