comment on

I *do* think that something is wrong with XML in terms of resources. Consider the XML and Storable files generated by this script (note that you should lower the number of records if you are short on RAM as the following tests will take near 1GB of memory or so):

use constant RECS => 1000000;
{
    open my $fh, ">/tmp/bla.xml" or die;
    select $fh;
    print "<addresses>\n";
    for (1..RECS) {
    print <<EOF;
  <address>
     <name>John Smith</name>
     <city>London</city>
  </address>
EOF
    }
    print "</addresses>\n";
}

{
    require Storable;
    my @addresses;
    for (1..RECS) {
    push @addresses, { name => "John Smith", city => "London" };
    }
    Storable::nstore(\@addresses, "/tmp/bla.st");
}
[download]

Two mostly equivalent data sources. Now the two benchmarks (I am using tcsh's time command here, showing system, user, elapsed time and maximum memory):

$ ( set time = ( 0 "%U+%S %E %MK" ) ; time perl -MStorable -e 'retriev
+e "/tmp/bla.st"' )
1.980+0.384 0:02.41 193974K
$ ( set time = ( 0 "%U+%S %E %MK" ) ; time perl -MXML::LibXML -e 'XML:
+:LibXML->new->parse_file("/tmp/bla.xml")->documentElement' )
6.037+1.876 0:08.15 643952K
[download]

So naive parsing of XML is much worse in both memory allocation and CPU time than loading the same Storable file. I guess that most other fast serializers like YAML::Syck or JSON::XS will give similar results.

In reply to Re^3: Memory Efficient XML Parser by eserte
in thread Memory Efficient XML Parser by perlgoon

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.