comment on

Well, if what you want is the collective blessing of the Monastery inhabitants on your devious practices, then I am afraid you can't have that without a proper offering. ;--)

Seriously, first I am a bit surprised by the time difference. The only benchmark I have seen shows XML::Parser being only 4 times slower than regexps. Maybe XML::LibXML would be faster.

Then of course, you can always use regexp to parse data. Just do not call it XML. It might indeed be well-formed XML (although as long as you haven't parsed it there is really no telling, the encoding might be all wrong for example), but the problem is that your system does not process XML. It processes a limited subset of it, which follows a format that should be described formally somewhere (even if it is just a list of XML features that are not used). It might actually be a good idea to call that format something like R-XML (Regan's XML) and to write everywhere that that's what your code processes. This way you or someone else who will need to maintain the system won't forget the limitations of the system. You can have a look at On XML parsing BTW to see examples of XML features that you probably don't support.

That said, to finish on a note that will make you feel good, here is what Tim Bray, one of the creator of XML, has to say:

That leaves input data munging, which I do a lot of, and a lot of input data these days is XML. Now here's the dirty secret; most of it is machine-generated XML, and in most cases, I use the perl regexp engine to read and process it. I've even gone to the length of writing a prefilter to glue together tags that got split across multiple lines, just so I could do the regexp trick.

The rest of the rant gives a little context and interesting comments.

Oh yeah, and I admit to having used regexps too sometimes, oddly enough not for speed purposes, but to use the power of the Perl regexp engine to wrap elements (now you can do this properly in XML::Twig of course ;--).

In reply to Re: xml parsers: do I need one? by mirod
in thread xml parsers: do I need one? by regan

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.