Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: xml parsers: do I need one?

by mirod (Canon)
on Aug 28, 2003 at 14:36 UTC ( #287384=note: print w/replies, xml ) Need Help??

in reply to xml parsers: do I need one?

Well, if what you want is the collective blessing of the Monastery inhabitants on your devious practices, then I am afraid you can't have that without a proper offering. ;--)

Seriously, first I am a bit surprised by the time difference. The only benchmark I have seen shows XML::Parser being only 4 times slower than regexps. Maybe XML::LibXML would be faster.

Then of course, you can always use regexp to parse data. Just do not call it XML. It might indeed be well-formed XML (although as long as you haven't parsed it there is really no telling, the encoding might be all wrong for example), but the problem is that your system does not process XML. It processes a limited subset of it, which follows a format that should be described formally somewhere (even if it is just a list of XML features that are not used). It might actually be a good idea to call that format something like R-XML (Regan's XML) and to write everywhere that that's what your code processes. This way you or someone else who will need to maintain the system won't forget the limitations of the system. You can have a look at On XML parsing BTW to see examples of XML features that you probably don't support.

That said, to finish on a note that will make you feel good, here is what Tim Bray, one of the creator of XML, has to say:

That leaves input data munging, which I do a lot of, and a lot of input data these days is XML. Now here's the dirty secret; most of it is machine-generated XML, and in most cases, I use the perl regexp engine to read and process it. I've even gone to the length of writing a prefilter to glue together tags that got split across multiple lines, just so I could do the regexp trick.

The rest of the rant gives a little context and interesting comments.

Oh yeah, and I admit to having used regexps too sometimes, oddly enough not for speed purposes, but to use the power of the Perl regexp engine to wrap elements (now you can do this properly in XML::Twig of course ;--).

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://287384]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2022-08-18 22:04 GMT
Find Nodes?
    Voting Booth?

    No recent polls found