This seems like a good job for line parsing. From the example fragment you have posted it seems like the XML file is very regular in its structure. If that is the case, I would stream in the file reading one <PC-Compound> element at a time like this:
my @compound; while (<IN>) { if (m/^\s*<PC-Compound>/) { @compound = ($_); } elsif (m/^\s*<\/PC-Compound>/) { push(@compound, $_); process_compound(); @compound = (); } else { push(@compound, $_) if (@compound); } }
When process_compound() is called, the array @compound will have the lines for one <PC-Compound> record which you can process with XML::twig or some other XML module. (Also, instead of pushing lines onto an array, you could also append to a string buffer if that's more convenient.)
Another option is to use something like XSLT to extract the records of interest, but that's a whole other technology.
In reply to Re: 1GB XML mining with XML:twig (newbies question)
by pc88mxer
in thread 1GB XML mining with XML:twig (newbies question)
by karpatov
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |