That's actually fairly simple and idiomatic (I recognized the match and copy everything bit right away, and the first part is verbage that goes on every stylesheet in some form). It's at least no worse than when people who don't know perl complain about perl. | [reply] |
It's powerful, very easy to customize, extremely fast, and handles huge XML documents quite well :) And besides, it isn't simply outputting "Hello World." It's recursing through various structures/nodes to duplicate an entire document, allowing fine-tuned controls with the simple touch of XPath. Hurrah!
I'd be curious to see how it benchmarks against some of Perl's XML modules. Perhaps it's overkill for something as simple as comment removal, perhaps not. | [reply] |
If you use a perl module, I'd recommend XML::LibXSLT which uses the libxslt library under the hood, so perl's "speed" or lack thereof should not be an issue. I wouldn't recommend XML::XSLT for any serious XSLT processing though.
| [reply] |
Extremely fast? Doesn't really look like it at least in some cases, especially as the processed XML grows. have a look eg. at the benchmark section in this document about XStream. It doesn't seem to handle huge documents so well either.
And I would definitely not call that "simple touch of XPath".
I'll see if I can find time during the weekend to implement the comment stripping using a few modules and benchmark it against XSLT. I do bet it's an overkill.
| [reply] |
As a very rough benchmark, I created a ~20MB xml file (I took the doc in the OP, and copied the middle part over and over). I filtered the doc using XSLT (using XML::LibXSLT, and the XSLT in the node above), and XML::Twig (the solution elsewhere in this thread). The XSLT took about 3 seconds, the XML::Twig took about 20. Both used vast quantities of memory.
And though I'm not familiar with XML::Parser::Expat, I hacked together something which seems to work (though I am likely missing something for some types of XML content), and ran in about 3 seconds without using much memory at all.
Update: repeated XSLT with 40MB file, took just a few more seconds. I wonder if Jenda is using XML::XSLT or XML::LibXSLT below. (and an 80MB file took ~25 secs w/XSLT and ~20 w/expat) (XML::LibXSLT 1.62, XML::LibXML 1.63).
Here is what I used:
| [reply] [d/l] |