You sure about that? :)
2.3. Unreserved Characters
Data characters that are allowed in a URI but do not have a reserved
purpose are called unreserved. These include upper and lower case
letters, decimal digits, and a limited set of punctuation marks and
symbols.
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
Unreserved characters can be escaped without changing the semantics
of the URI, but this should not be done unless the URI is being used
in a context that does not allow the unescaped character to appear.
Taken from http://www.faqs.org/rfcs/rfc2396.html. | [reply] |
I call your RFC2396 with RFC1738, for those people who want to be backwards compatable:
(from section 2.2)
Unsafe:
Characters can be unsafe for a number of reasons. The space
character is unsafe because significant spaces may disappear and
insignificant spaces may be introduced when URLs are transcribed or
typeset or subjected to the treatment of word-processing programs.
The characters "<" and ">" are unsafe because they are used as the
delimiters around URLs in free text; the quote mark (""") is used to
delimit URLs in some systems. The character "#" is unsafe and should
always be encoded because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor identifier that might
follow it. The character "%" is unsafe because it is used for
encodings of other characters. Other characters are unsafe because
gateways and other transport agents are known to sometimes modify
such characters. These characters are "{", "}", "|", "\", "^", "~",
"[", "]", and "`".
All unsafe characters must always be encoded within a URL. For
example, the character "#" must be encoded within URLs even in
systems that do not normally deal with fragment or anchor
identifiers, so that if the URL is copied into another system that
does use them, it will not be necessary to change the URL encoding.
Now, I know, you're going to quote RFC2396 again (section G.2):
The tilde "~" character was added to those in the "unreserved" set,
since it is extensively used on the Internet in spite of the
difficulty to transcribe it with some keyboards.
But, in the past, it was a reserved character.
| [reply] [d/l] |
Good point. I think that begs the question, "how far backwards is too far?" It's good to be backwards compatible within reason, but there's a point where it becomes unreasonable. The real question is, where is that point? Personally, I wouldn't worry about using an unescaped tilde, seeing as they've been around as long as I can remember (pre-Mosaic 1.0 maybe?).
| [reply] |