in reply to http interpolation

Perhaps the real thing to note is that "~" is illegal in a URL, even though everyone uses it. {grin}

You're supposed to use "%7E" instead.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

Replies are listed 'Best First'.
Re^2: http interpolation
by kwaping (Priest) on Jul 01, 2005 at 19:38 UTC
    You sure about that? :)
    2.3. Unreserved Characters

    Data characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include upper and lower case letters, decimal digits, and a limited set of punctuation marks and symbols.

    unreserved = alphanum | mark

    mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

    Unreserved characters can be escaped without changing the semantics of the URI, but this should not be done unless the URI is being used in a context that does not allow the unescaped character to appear.

    Taken from http://www.faqs.org/rfcs/rfc2396.html.

      I call your RFC2396 with RFC1738, for those people who want to be backwards compatable:

      (from section 2.2)

      Unsafe:

      Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs. The characters "<" and ">" are unsafe because they are used as the delimiters around URLs in free text; the quote mark (""") is used to delimit URLs in some systems. The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. The character "%" is unsafe because it is used for encodings of other characters. Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".

      All unsafe characters must always be encoded within a URL. For example, the character "#" must be encoded within URLs even in systems that do not normally deal with fragment or anchor identifiers, so that if the URL is copied into another system that does use them, it will not be necessary to change the URL encoding.

      Now, I know, you're going to quote RFC2396 again (section G.2):

      The tilde "~" character was added to those in the "unreserved" set, since it is extensively used on the Internet in spite of the difficulty to transcribe it with some keyboards.

      But, in the past, it was a reserved character.

        Good point. I think that begs the question, "how far backwards is too far?" It's good to be backwards compatible within reason, but there's a point where it becomes unreasonable. The real question is, where is that point? Personally, I wouldn't worry about using an unescaped tilde, seeing as they've been around as long as I can remember (pre-Mosaic 1.0 maybe?).