comment on

Dear Monks,

A tool used to validate scientific data sets (many GB’s) spits out an XML file full of stuff. The XML file can get rather big (hundreds of MB’s). Typically several sessions are needed to validate a data set and all is recorded in the XML file. When for example errors are fixed in the data set and the tool is rerun the XML file gets updated, at least that was the idea.

Most (if not all?) solutions use the DOM approach. Slurp everything in memory into some data structure, manipulate the data structure and write it back to disk. But with big files this is not workable.

Some of the options mentioned/thought-up:

put it in a relational database and let the DBMS do the work
put it in a native XML database
write custom software
use an XML Query implementation

Long ago, in the distant past, I created a Java based solution, parsing large files with SAX and generating DOM trees on-the-fly which were manipulated. I must be getting senile because it seems to have vanished from my memory.

Does anybody know off a more memory friendly (read non-DOM), preferably XML-like, solution? I would like to use an event based parser and update the XML file when needed. Maybe I am asking for too much?

Saludos,
dHarry

In reply to updating big XML files by dHarry

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.