dHarry has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks,
A tool used to validate scientific data sets (many GB’s) spits out an XML file full of stuff. The XML file can get rather big (hundreds of MB’s). Typically several sessions are needed to validate a data set and all is recorded in the XML file. When for example errors are fixed in the data set and the tool is rerun the XML file gets updated, at least that was the idea.
Most (if not all?) solutions use the DOM approach. Slurp everything in memory into some data structure, manipulate the data structure and write it back to disk. But with big files this is not workable.
Some of the options mentioned/thought-up:
Long ago, in the distant past, I created a Java based solution, parsing large files with SAX and generating DOM trees on-the-fly which were manipulated. I must be getting senile because it seems to have vanished from my memory.
Does anybody know off a more memory friendly (read non-DOM), preferably XML-like, solution? I would like to use an event based parser and update the XML file when needed. Maybe I am asking for too much?
Saludos,
dHarry
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: updating big XML files
by pc88mxer (Vicar) on Jul 18, 2008 at 15:21 UTC | |
|
Re: updating big XML files
by pjotrik (Friar) on Jul 18, 2008 at 15:30 UTC | |
by Anonymous Monk on Jul 18, 2008 at 21:09 UTC | |
by dHarry (Abbot) on Jul 21, 2008 at 13:31 UTC | |
by dHarry (Abbot) on Jul 21, 2008 at 13:51 UTC | |
|
Re: updating big XML files
by pajout (Curate) on Jul 19, 2008 at 11:32 UTC | |
by dHarry (Abbot) on Jul 21, 2008 at 13:43 UTC | |
by pajout (Curate) on Jul 21, 2008 at 22:47 UTC |