in reply to Regular expression needed (maybe)

A warning: if you're trying to create a semi general purpose solution for HTML or XML files, you should use a proper parser module rather than just a regex. Pattern matching against marked up text is a very fragile approach. For HTML, see HTML::TokeParser::Simple - there's even a bunch of useful examples in the documentation, and chances are you can just lift one of them and modify it to suit your needs.

Makeshifts last the longest.

Replies are listed 'Best First'.
Re: Re: Regular expression needed (maybe)
by Anonymous Monk on Jan 15, 2003 at 13:39 UTC
    A warning: if you're trying to create a smei general purpose solution for HTML or XML files, you should use a proper parser module rather than just a regex.

    No offense, but you are beginning to sound like a broken record about this. Instead of repeating the same thing over and over again, why don't you just provide a link to one of your other posts about the subject?

    (By the way, I disagree with you. Most of the time, you would want to use a module, but what he is trying to do is simple enough for a regex to handle.)

      And, brother, I disagree with your disagreement. A module represents a black box that has been extensively tested. Equally as important, it contains within it extensive error-checking and error-handling.

      The latter is crucial to the success of any serious development because a mis-typed character will stymie a developer for hours. Those are wasted hours. Wasted hours are wasted dollars.

      A lot of parsing is simple enough for a regex to handle. In fact, regexes are mini-parsers. But, once you start dealing with parsing things that have to balance, that's not simple at all. Much better to leave that kind of work to the experts who are kind enough to give me stuff that works free of charge.

      Be Lazy - let other people do the work for you.

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

      That's why I added my disclaimer "if you're trying to create a semi general purpose solution" (sorry about the typo). Very simple things can be done with a regex. Also if he knows exactly what his data looks like and this is a one-off job a regex is quite likely to suffice. I assert that unless you have some experience to make a good call it's safer to err on the side of using a parser for X?(HT)?ML where a regex might have sufficed, though.

      But in this case I have no idea if he is even parsing markup at all or just something that happens to look like it. So instead of giving a possibly ill-advised suggestion I chose to just raise awareness about the issue and leave it at that. Sorry to sound like a broken record, but that's because I'm responing to the broken record of a question that keeps coming up.

      Makeshifts last the longest.