in reply to multiple line regex

When you try to parse HTML with a regex, you are in a state of sin. When you try to parse HTML tables with regexes, you are really riding the bad karma train.

If you insist on parsing HTML yourself, do yourself a favor and use HTML::Parser or HTML::TokeParser.

Given that your data is in an HTML table, though, I strongly recommend HTML::TableExtract.

Replies are listed 'Best First'.
Re: Re: multiple line regex
by zby (Vicar) on Apr 23, 2004 at 10:17 UTC
    I think he is not really parsing the file - he is rather just scanning it for particular values - so I think in this case using regexps is justified. Of course that's just a guess.

      Parse

      4. Computer Science. To analyze or separate (input, for example) into more easily processed components.

      Granted this is a log file with a (probably) fixed format, so it's only slightly evil to use a regex instead of a proper parser. But it is parsing none the less.

        That's a bit hair splitting but, for me parsing is more about reconstructing the structure from a flattened representation. I even dig a supporting online definition: parse:

        1. To determine the syntactic structure of a sentence or other utterance

        And here he does not need to think about the structure - all he needs is that the needed information is just after and just before some specified strings.

        Following the structure of the document is usually more general way of searching the needed information, because it is more formally defined, but sometimes you don't need to do that since you know about other properties of the data that can lead you to your goal (and sometimes you don't know much about the structure of the document).