in reply to Re: Regular Expression Help
in thread Regular Expression Help
HTML (3.0, 4.0, etc.) is not a subset of XML, at least not until you get to the XHTML stage. XML and HTML are each subsets of SGML. The main reason I bring up this point is that HTML is - by and large - not well-formed. I'm willing to bet most XML parsers will choke on a common HTML page, simply because most HTML pages aren't structured properly. A <P> tag without a corresponding </P> tag would probably be the second most common offense, not to mention <IMG SRC="blah.gif"> doesn't have a slash terminator; neither of which are smiled upon in XML.
Granted, it's a moot point if you hand-craft the HTML code going into your programs, but if you're analyzing other websites, assuming that they have properly-structured HTML is probably an unwise programming move, IMO.
andre germain
"Wherever you go, there you are."
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: HTML and XML
by mirod (Canon) on Sep 02, 2001 at 11:46 UTC |