in reply to 1GB XML mining with XML:twig (newbies question)

Please elaborate on what kind of troubles you run into. Running out of memory comes to mind -- are there any other problems?

This seems like a good job for line parsing. From the example fragment you have posted it seems like the XML file is very regular in its structure. If that is the case, I would stream in the file reading one <PC-Compound> element at a time like this:

my @compound; while (<IN>) { if (m/^\s*<PC-Compound>/) { @compound = ($_); } elsif (m/^\s*<\/PC-Compound>/) { push(@compound, $_); process_compound(); @compound = (); } else { push(@compound, $_) if (@compound); } }

When process_compound() is called, the array @compound will have the lines for one <PC-Compound> record which you can process with XML::twig or some other XML module. (Also, instead of pushing lines onto an array, you could also append to a string buffer if that's more convenient.)

Another option is to use something like XSLT to extract the records of interest, but that's a whole other technology.