in reply to Regular Expression

my $body = $the_html_string; $body =~ s/(<body.*?>)/$1/sgi;

Replies are listed 'Best First'.
Re^2: Regular Expression
by CountZero (Bishop) on Jun 28, 2005 at 19:24 UTC
    What will happen when a <body> tag is included in comments? Your regex will break. I'm almost sure that the one who gave the instruction to find the <body> of the HTML code with a simple regex did not think about this.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      Interesting observation - do you often see body tags enclosed in comments? You are assuming the poster doesn't want body tags enclosed in comments. ;) In any case, I think this pattern is better (added ^):
      my $body = $the_html_string; $body =~ s/^.*?(<body.*?>)/$1/sgi;
        One can put anything in comments. Sometimes I have different alternative opening tags and the one I don't need I simply comment out. A naive regex search for the opening <body> tag would perhaps find the wrong (=commented out) tag.

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

        You will still end up with a trailing </html>-tag and any comments outside of the closing </body> tag.

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re^2: Regular Expression
by Anonymous Monk on Jun 28, 2005 at 19:30 UTC
    THat worked. Thanks!