comment on

Well, if what you want is the collective blessing of the Monastery inhabitants on your devious practices, then I am afraid you can't have that without a proper offering. ;--)

Seriously, first I am a bit surprised by the time difference. The only benchmark I have seen shows XML::Parser being only 4 times slower than regexps. Maybe XML::LibXML would be faster.

Then of course, you can always use regexp to parse data. Just do not call it XML. It might indeed be well-formed XML (although as long as you haven't parsed it there is really no telling, the encoding might be all wrong for example), but the problem is that your system does not process XML. It processes a limited subset of it, which follows a format that should be described formally somewhere (even if it is just a list of XML features that are not used). It might actually be a good idea to call that format something like R-XML (Regan's XML) and to write everywhere that that's what your code processes. This way you or someone else who will need to maintain the system won't forget the limitations of the system. You can have a look at On XML parsing BTW to see examples of XML features that you probably don't support.

That said, to finish on a note that will make you feel good, here is what Tim Bray, one of the creator of XML, has to say:

That leaves input data munging, which I do a lot of, and a lot of input data these days is XML. Now here's the dirty secret; most of it is machine-generated XML, and in most cases, I use the perl regexp engine to read and process it. I've even gone to the length of writing a prefilter to glue together tags that got split across multiple lines, just so I could do the regexp trick.

The rest of the rant gives a little context and interesting comments.

Oh yeah, and I admit to having used regexps too sometimes, oddly enough not for speed purposes, but to use the power of the Perl regexp engine to wrap elements (now you can do this properly in XML::Twig of course ;--).

In reply to Re: xml parsers: do I need one? by mirod
in thread xml parsers: do I need one? by regan

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Welcome to the Monastery
	PerlMonks