Re^4: Understanding this particular Regex.

"the validator came up short of fully satisfying the w3c 4.01 transitional spec and even farther short of the strict spec"

It is true that there are conformance requirements which the validator is unable to check. However, my example exploits none of these. I haven't tricked the validator; it's simply a valid HTML 4.01 Transitional document.

It would be valid HTML 4.01 Strict, except that the <hr size> attribute is presentational and Strict doesn't contain most of the presentational attributes.

If you prefer an example that passes Strict:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<title>Foo</title>
<hr class = size-1 >
[download]

"The validator, for example, blesses your code ("validates") without error (albeit, with warnings) despite the lack of <head>...</head>, <body>...</body> and <html>...</html> tags... and that's using the transitional spec which allows no such things.

The <html>, <head> and <body> start and end tags are all optional in every version of HTML that has ever been published by the W3C. (They are of course required in XHTML, but that's not what we're talking about.)

For example, see The HTML element, which says, "Start tag: optional, End tag: optional". You'll find the same under the definitions for HEAD, BODY and also TBODY. Many elements have optional end tags, but IIRC those are the only four with optional start tags.

"If you try it with strict, upload mode, and add: <table width = 17%> you'll see even the validator lets fly"

Indeed. As I said, attribute values do not need to be quoted if they conform to the regexp /^[A-Za-z0-9_:-]+$/. The percent sign character is disallowed by that regexp, so that attribute value needs quoting.

"Your regex and the accompanying statement are correct, as far as they go, but are most closely applicable to webmonkeys (yeah, been there; done that.) writing for NS or IE4 style browsers."

You think modern browsers don't support HTML 4.01? In most cases they support it better than those early browsers you mention did; and in most cases they support HTML better than they support full-blown XHTML.

package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name

Comment on Re^4: Understanding this particular Regex. Select or Download Code

Replies are listed 'Best First'.
Re^5: Understanding this particular Regex. by ww (Archbishop) on May 06, 2013 at 15:14 UTC
Generally, ++ tobyink but we still disagree on more points than the limited number to which I'm inclined to create well-documented counters. But your question/rhetorical question, "You think modern browsers don't support HTML 4.01?" is just the opposite of my intent. Of course they do... but when the cited browsers were "the latest and greatest" we saw an awful lot of utterly non-compliant markup because devs were pushing out code that satisfied a particular browser (only). Think, also, of how commonly we used to see "`<table width = 347...>`", with only an implicit "px" -- i.e. code relying, mistakenly, on sometimes inconsistent calculations by various browsers).	[reply] [d/l]
Re^6: Understanding this particular Regex. by tobyink (Canon) on May 06, 2013 at 16:12 UTC
`<table width = 347>` is valid HTML. `<table width="347px">` is invalid. The `px` unit is part of CSS; not HTML. In HTML, all sizes are expressed as either percentages, or a number which is implicitly in pixels. (Except `<font size>` where the number has its own special brand of craziness.) I agree that there's a lot of invalid HTML out there, and certain older browsers encouraged it, but the OP's example is valid (albeit unidiomatic) HTML. `package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name`	[reply] [d/l] [select]