in reply to Preferred Methods (again)

Are you trying to parse real XML, or just to build a tool that can handle some small subset of it? The reason I ask is that actual XML can contain a lot of oddities that your code above will not handle property. For instance, XML allows attribute-value pairs to have spaces around the equal sign, like this:
<Root id = "456990">

It looks like your code would choke on that, though. Unless you are absolutely sure that your input data, now and forever, will not contain anything but the ultra-strict subset of XML that your code will support, I would urge you to use XML::Parser. (I believe its internals are written in C, so it's actually quite fast; have you benchmarked it on your specific documents to see if it will meet your needs?)

I know you mentioned your criterion for efficiency is execution speed, and that you don't want to use a separate parser, so maybe I should just butt out. It's just that years of working with HTML and more recently XML have taught me to be extremely cautious. Building a parser that really respects the specs is a non-trivial task, and I'd hate to see fragile code go into production and then have to be torn out later for maintenance, when a perfectly good module is already available to do the task you intend.

Replies are listed 'Best First'.
Re: Re: Preferred Methods (again)
by vek (Prior) on Jan 17, 2002 at 03:33 UTC
    Just build a quick tool to handle a small subset.

    The XML in question is generated by an in-house Java app written by a co-worker. The Root attribute format is hardcoded so for this application only I'm confident that the format will not change. I'm well aware of the pitfalls of regex parsing and would not (and in fact do not in other code) dream of doing that when parsing XML from another source.

    In my reply to juerd I mentioned that this code runs on a 'gateway' box. That box just accepts XML from a socket connection, archives the XML and then forwards it on to the database box for real parsing via XML::Parser(including the handling of base64 encoded print images & other fun stuff). Therefore the only thing this code needs to do is to be able to identify the type of XML message - as specified by the 'Group' attribute of Root so that the XML can be archived correctly.

    The intent of my post (and I know I should have clarified it) was really to have people comment on the regex. I didn't mean to start a war over whether or not you should use a parser or not.