in reply to Re^4: Regular Expression
in thread Regular Expression

Nope, not rhetorical, just a trap!

This is valid XHTML:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w +3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta content="text/html; charset=ISO-8859-1" http-equiv="content-ty +pe" /> <title>test</title> </head> <body/> </html>

No </body> tag, so a naive regex breaks again.

Of course you are right, the OP only wanted to match the <body> tag not the body-element.

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Replies are listed 'Best First'.
Re^6: Regular Expression
by Transient (Hermit) on Jun 28, 2005 at 19:58 UTC
    Funny you should mention that (we were discussing this with the <p> tag) - it's not valid because body isn't defined with an EMPTY content model (as is br, hr, etc.)

    w3c specs shown here
      Well, I'm not sure about that.

      The DTD of XHTML defines the <body>-tag as <!ELEMENT body %Block;> and %Block is further defined as

      <!ENTITY % block "p | %heading; | div | %lists; | %blocktext; | fieldset | table"> <!ENTITY % Block "(%block; | form | %misc;)*">

      Watch the * in the definition of %Block: zero or more times! So %Block can be empty, hence <body> can be empty and can be written as <body/>.

      Also the official validator at W3C says it is valid XHTML1.0

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law