Re: Re: XML and file size

Rather than slurping the whole XML into a DOM for appending information, a SAX approach can be used. Simply pass through everything but the closing root tag. On encountering it, emit the new node and then the closing root tag.
This is much faster and much more memory friendly.

But yes, appending to XML is expensive.

Just my 2 cents, -gjb-

Comment on Re: Re: XML and file size

Replies are listed 'Best First'.
Re: Re: Re: XML and file size by roundboy (Sexton) on Jan 07, 2003 at 18:38 UTC
Thanks, a very good point. A SAX parser is the robust way to implement "creative file scanning", and I just didn't think of it. But the point regarding this alternative remains true: namely, that as you start doing additional tasks beyond reading and appending, it'll get progressively harder to get it done. Regardless, since the goal of the project is to learn new technologies, maybe the best approach would be this: do a little reading, and a lot of thinking, about how XML document types can be used to represent various structures, and then consider what kinds of structural relationships will exist within the journal data. Then choose a data representation, and write a schema or DTD (even if no validation is needed, it's good practice). Finally, play with the various tools including both kinds of parsers. I'd even suggest poking around with a q&d "parser" that builds on something like `my ($tag, $attrs, $body) = /<(\w+)\s+(.?)>(.?)</\1>/s;` [download] to see why it is disrecommended by so many people. `--roundboy`	[reply] [d/l]

Replies are listed 'Best First'.

Re: Re: Re: XML and file size
by roundboy (Sexton) on Jan 07, 2003 at 18:38 UTC

Thanks, a very good point. A SAX parser is the robust way to implement "creative file scanning", and I just didn't think of it. But the point regarding this alternative remains true: namely, that as you start doing additional tasks beyond reading and appending, it'll get progressively harder to get it done.

Regardless, since the goal of the project is to learn new technologies, maybe the best approach would be this: do a little reading, and a lot of thinking, about how XML document types can be used to represent various structures, and then consider what kinds of structural relationships will exist within the journal data. Then choose a data representation, and write a schema or DTD (even if no validation is needed, it's good practice). Finally, play with the various tools including both kinds of parsers. I'd even suggest poking around with a q&d "parser" that builds on something like

my ($tag, $attrs, $body) = /<(\w+)\s+(.*?)>(.*?)</\1>/s;
[download]

--roundboy

[reply]
[d/l]