Re^2: Removing XML comments with regex

Replies are listed 'Best First'.

Re^3: Removing XML comments with regex
by runrig (Abbot) on Oct 25, 2007 at 04:42 UTC

That's actually fairly simple and idiomatic (I recognized the match and copy everything bit right away, and the first part is verbage that goes on every stylesheet in some form). It's at least no worse than when people who don't know perl complain about perl.

[reply]

Re^3: Removing XML comments with regex
by eff_i_g (Curate) on Oct 25, 2007 at 04:08 UTC

[reply]

Re^4: Removing XML comments with regex

by runrig (Abbot) on Oct 25, 2007 at 04:52 UTC

XML::LibXSLT

XML::XSLT

[reply]

Re^4: Removing XML comments with regex

by Jenda (Abbot) on Oct 25, 2007 at 11:15 UTC

Extremely fast? Doesn't really look like it at least in some cases, especially as the processed XML grows. have a look eg. at the benchmark section in this document about XStream. It doesn't seem to handle huge documents so well either.

And I would definitely not call that "simple touch of XPath".

I'll see if I can find time during the weekend to implement the comment stripping using a few modules and benchmark it against XSLT. I do bet it's an overkill.

Jenda
Support Denmark!
Defend the free world!

[reply]

Re^5: Removing XML comments with regex

by runrig (Abbot) on Dec 28, 2007 at 20:59 UTC

As a very rough benchmark, I created a ~20MB xml file (I took the doc in the OP, and copied the middle part over and over). I filtered the doc using XSLT (using XML::LibXSLT, and the XSLT in the node above), and XML::Twig (the solution elsewhere in this thread). The XSLT took about 3 seconds, the XML::Twig took about 20. Both used vast quantities of memory.

And though I'm not familiar with XML::Parser::Expat, I hacked together something which seems to work (though I am likely missing something for some types of XML content), and ran in about 3 seconds without using much memory at all.

Update:

~~I wonder if Jenda is using XML::XSLT or XML::LibXSLT below.~~