Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^2: parse XML huge file using cpan modules

by Jenda (Abbot)
on Jul 31, 2019 at 09:31 UTC ( [id://11103643]=note: print w/replies, xml ) Need Help??


in reply to Re: parse XML huge file using cpan modules
in thread parse XML huge file using cpan modules

Well, yes but no.

There are more ways and often your hailed "industry-standard binary libraries" support several.

You can use one of several libraries to load the whole file into memory as a huge maze of objects and then search and navigate the maze using methods and sublanguages like XPath.

You can use one of several libraries to load the whole file into memory as a huge memory structure (possibly with a bit of tie() magic) and navigate it using normal Perl tools. You should NOT use XML::Simple for that 'cause it produces inconsistent data structures! If the data structure is your goal, then have a look at XML::Rules, it would allow you to produce a consistent structure and trim it along the way.

You can use one of several libraries to have them call your handlers whenever they find another bit of whatever in the XML and take care of knowing where the heck you are in the structure yourself. Good luck with that! Industry standard or no industry standard. It's a mess.

You can use one of several libraries to give you the next bit whenever you ask for it and take care of knowing where the heck you are in the structure yourself. Good luck with that! Industry standard or no industry standard. It's a mess.

You can use XML::Twig to call your handler whenever it finishes parsing a reasonably large, easy to digest chunk of the XML (a twig) and have it provide you with the data from the twig either as a maze of objects or a data structure.

You can use XML::Rules to call your handler whenever it finishes parsing a reasonably large, easy to digest chunk of the XML and have it provide you with the data from the chunk as a data structure built according to the rules you provided, handle or massage the data in any way you need and have the result made available to the handler of an enclosing chunk and thus either process the file as you go or build a modified, trimmed down data structure.

Jenda
Enoch was right!
Enjoy the last years of Rome.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11103643]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-03-29 08:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found