Re: Filtering large XML files

Looking over XML::LibXML::Reader, and having worked with XML::LibXML recently, I see a mismatch between what you want and its design. Reader is great for reading XML nodes incrementally but LibXML likes dealing with complete nodes & documents; I can't think of an easy fool-proof way to make it write chunks of a partial document, which is what you'll be doing. (Another monk might yet figure it out!)

As an alternative, SAX is good for stream-processing XML. It doesn't load an entire document in memory thus is good for incremental work (and not so good for document processing where you want random access), and it's filter-centered design is meant for precisely what you're doing.

Look for XML::SAX. I only briefly looked at it for a project recently; might be helpful for you. It's a different way of thinking than LibXML for sure!

EDIT I hadn't seen XML::Twig before, the perl-ish way of handling XML, and in theory "as good as SAX"- good to learn something new! Seems good to use that if possible, and if it still segfaults try contacting the author who is always wanting more tests, see "Test Coverage" on its page.

Comment on Re: Filtering large XML files

Replies are listed 'Best First'.
Re^2: Filtering large XML files by choroba (Cardinal) on Feb 24, 2015 at 13:40 UTC
XML::LibXML::Reader is kind of a SAX for XML::LibXML. It allows you to process the XML stream, but you can ask it any time to parse the current node and return its corresponding XML::LibXML object. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply]
Re^2: Filtering large XML files by PT (Novice) on Feb 24, 2015 at 09:26 UTC
Thanks Yary for suggesting `XML::SAX`! I'll look into it. Currently I'm testing `XML::Twig`. Seems to be working OK so far. Good luck with your projects!	[reply] [d/l] [select]