in reply to Re: Regular Expression
in thread Regular Expression

What will happen when a <body> tag is included in comments? Your regex will break. I'm almost sure that the one who gave the instruction to find the <body> of the HTML code with a simple regex did not think about this.

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Replies are listed 'Best First'.
Re^3: Regular Expression
by kwaping (Priest) on Jun 28, 2005 at 19:29 UTC
    Interesting observation - do you often see body tags enclosed in comments? You are assuming the poster doesn't want body tags enclosed in comments. ;) In any case, I think this pattern is better (added ^):
    my $body = $the_html_string; $body =~ s/^.*?(<body.*?>)/$1/sgi;
      One can put anything in comments. Sometimes I have different alternative opening tags and the one I don't need I simply comment out. A naive regex search for the opening <body> tag would perhaps find the wrong (=commented out) tag.

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      You will still end up with a trailing </html>-tag and any comments outside of the closing </body> tag.

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law