in reply to 1GB XML mining with XML:twig (newbies question)
This seems like a good job for line parsing. From the example fragment you have posted it seems like the XML file is very regular in its structure. If that is the case, I would stream in the file reading one <PC-Compound> element at a time like this:
my @compound; while (<IN>) { if (m/^\s*<PC-Compound>/) { @compound = ($_); } elsif (m/^\s*<\/PC-Compound>/) { push(@compound, $_); process_compound(); @compound = (); } else { push(@compound, $_) if (@compound); } }
When process_compound() is called, the array @compound will have the lines for one <PC-Compound> record which you can process with XML::twig or some other XML module. (Also, instead of pushing lines onto an array, you could also append to a string buffer if that's more convenient.)
Another option is to use something like XSLT to extract the records of interest, but that's a whole other technology.
|
|---|