Nice benchmark. I wouldn't use XML::Parser's Stream style, but it's probably because I am not very familiar with it.

I expanded slightly this benchmark, creating a somewhat more complicated document, still around 3M and 10K elements, and run a bunch of modules on it.The results are quite surprising actually:

10160 elements generated - (63 top level - 1721 to extract)
bench_regexp             : 0:00.16 real 0.14  0.03 s
bench_libxml             : 0:00.44 real 0.39  0.05 s
bench_parser             : 0:00.88 real 0.83  0.01 s
bench_parser_stream      : 0:01.15 real 1.10  0.06 s
bench_twig               : 0:01.84 real 1.81  0.03 s
bench_sax_base_libxml    : 0:03.29 real 3.25  0.05 s
bench_sax_libxml         : 0:03.32 real 3.31  0.03 s
bench_sax_expat          : 0:03.21 real 3.11  0.03 s
bench_dom                : 0:04.51 real 4.41  0.03 s
libxslt                  : 0:01.48 real 1.46  0.02 s
xml_grep                 : 0:02.07 real 2.02  0.03 s

I am very surprised by how slow the XML::SAX examples are (hence I wrote one using SAX::Base and 1 not using it). I did not expect this, and I will try to figure out what the problem is. If you look at the code, I really don't think I am using the PurePerl parser, I took great care of creating the parser myself. That's odd.

Code and everything to run it is at http://xmltwig.com/article/simple_benchmark/.

Get simple_benchmark.tar.gz

tar zxvf simple_benchmark.tar.gz cd simple_benchmark perl run_all

Note that the xml_grep version only works with the latest, greatest release of the tool, available somewhere else on the same site (with the development version of XML::Twig).


In reply to Re: Re: xml parsers: do I need one? by mirod
in thread xml parsers: do I need one? by regan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.