in reply to Picking the best way....

This has come up before. Not all modules are as well written nor as essential as as CGI.pm.

As for dealing with XML or HTML using regular expressions... I did that just the other day. I had a well-defined set of HTML to deal with and finding the right HTML-parsing module would have probably taken more time than rolling my own regexen did (and then I'd have to learn how to use that module and then apply that to the problem at hand).

If you are going to end up dealing with not-previously known XML/HTML, then I strongly recommend a module. Unfortunately, the module landscape in that area is still a bit rocky and undermapped. Several modules to choose from, most of which have some problems at least in some situations.

Also take a look at Why I like functional programming for another example of not using a module to parse HTML to excellent effect. It is one of many cases that remind me that we often deal with something that is nearly HTML or XML, which can make all the modules useless.

I'm not disagreeing with your recommendation to try to use modules. That is an excellent idea. I'm just advocating moderation. (:

        - tye (but my friends call me "Tye")

Replies are listed 'Best First'.
Re: (tye)Re: Picking the best way....
by mirod (Canon) on May 05, 2001 at 13:00 UTC

    I am not disagreeing with your recommendation to use moderation. That is an excellent idea. I am just advocating _extreme_caution ;--)

    Especially when dealing with XML, which is a deceiptively simple format.

    You can certainly use regexps to write a throw-away hack, which is going to be used only once, on very well known XML data, ideally generated by code you have also written yourself. That's about it! And it doesn't happen that often.

    Using regexps on any thing else means that sooner or later you will come accross something that's completely legal XML, but that completely breaks your code. And believe me, if it is legal XML (and most likely even if it is not) it is bound to pop up in your data. You can hava a look at On XML Parsing for just a quick list of what can go wrong.

    A last word: if you are dealing with something that is nearly (...) XML, do yourself a favor: use 2 steps: First get from the nearly-thingie to the real stuff, and then use an XML module. It would be even better if you could refuse the data alltogether because it is not valid!