I call your RFC2396 with RFC1738, for those people who want to be backwards compatable:
(from section 2.2)
Unsafe:
Characters can be unsafe for a number of reasons. The space
character is unsafe because significant spaces may disappear and
insignificant spaces may be introduced when URLs are transcribed or
typeset or subjected to the treatment of word-processing programs.
The characters "<" and ">" are unsafe because they are used as the
delimiters around URLs in free text; the quote mark (""") is used to
delimit URLs in some systems. The character "#" is unsafe and should
always be encoded because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor identifier that might
follow it. The character "%" is unsafe because it is used for
encodings of other characters. Other characters are unsafe because
gateways and other transport agents are known to sometimes modify
such characters. These characters are "{", "}", "|", "\", "^", "~",
"[", "]", and "`".
All unsafe characters must always be encoded within a URL. For
example, the character "#" must be encoded within URLs even in
systems that do not normally deal with fragment or anchor
identifiers, so that if the URL is copied into another system that
does use them, it will not be necessary to change the URL encoding.
Now, I know, you're going to quote RFC2396 again (section G.2):
The tilde "~" character was added to those in the "unreserved" set,
since it is extensively used on the Internet in spite of the
difficulty to transcribe it with some keyboards.
But, in the past, it was a reserved character.
| [reply] [d/l] |
Good point. I think that begs the question, "how far backwards is too far?" It's good to be backwards compatible within reason, but there's a point where it becomes unreasonable. The real question is, where is that point? Personally, I wouldn't worry about using an unescaped tilde, seeing as they've been around as long as I can remember (pre-Mosaic 1.0 maybe?).
| [reply] |
It's an issue with benefit cost ratio, in my opinion. What's the benefit? Well, I don't really know... but I know the cost of encoding a ~ is rather small, so it's one of those things that I'm willing to do without thinking about it.
Now, I admit, I've done things that aren't backwards compatable with HTTP/0.9 (because there was no concept of HTTP headers in 0.9, you're not supposed to return any headers unless the request string specifies that it's HTTP/1.0 or later) ... but as Apache chokes on that, I'd have to do all of the heavy lifting, so I don't think it's worth it.
I think I supported browsers without table support until about 1999 or so. (but that was mostly because we'd connect to a page to get modem debugging info over lynx, when we were trying to set up problem connections at the ISP I worked for at the time) In that case, the cost was insignificant (I had already been dealing with HTML backwards compatability for 4 years at that point, so it came without thinking), and the benefits were measurable (shaved minutes off our time in debugging connections).
| [reply] |