Hello Perl Monks.
I need to filter a potentially huge xml file. The simplified structure looks like this:
<doc> <text> <p> text1 p1 </p> ... </text> <text> <p> text2 p1 </p> </text> ... </doc>
I need to get the text content of each <text> node, call a binary that processes the text and evaluates whether the current <text> node is to be removed.
My idea is to hold a single <text> node in a memory, evaluate its text content and either print it immediately or forget it and process the next <text> node.
I would like to use LibXML::Reader, but I found no way to incrementally output the XML as it goes through the nodes. I know XML::Twig can flush, but I have encountered SEGFAULT in the past while processing ~GB XML documents through it, which I was not able to debug. So I'd rather stay on the safe side with LibXML. Any idas how to tackle this problem?
Thanks a bunch!
In reply to Filtering large XML files by PT
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |