http://qs1969.pair.com?node_id=346440

water has asked for the wisdom of the Perl Monks concerning the following question:

If one seeks to pass a full URL to a web app via a parameter in url, what characters must be escaped to ensure the param URL makes it across intact?

I'm thinking '&', '/', '?', '=', and '%' -- but is there a spec somewhere with the official answer?

With a URL like this

http:://www.mysite.com/mywebapp1/dosomething?url=http://www.myothersit +e.com/myotherwebapp2/foo.asp?param=1&param=3
the issue is making sure the webapp2 gets
url = http://www.myothersite.com/myotherwebapp2/foo.asp?param=1&param= +3
and that the params of webapp2 aren't taken as params to webapp1.

Thanks for any leads or links

Replies are listed 'Best First'.
Re: encoding URLs in URLs
by Anonymous Monk on Apr 20, 2004 at 05:14 UTC

    You do indeed want the uri_escape() method from URI::Escape. The escapeHTML() method from CGI is what you want when you're outputting user-provided HTML in a page. So escapeHTML() escapes HTML, uri_escape() escapes special characters for use in a URI. Simple :)

      Um... not quite. That is, yes, judicious use of escapeHTML can help to avoid having users enter html code where you just expected them to enter text, and fubar'ing the resulting page. However, what you've said seems to imply that you'd never use escapeHTML on text that you generate yourself.

      You really want to apply escapeHTML() to anything that you're sending out as part of an HTML page that you want used "as is". That is, assuming that the original poster is going to put the output of this function and put it into an html page, (instead of, for example, sending it out as the value of a Location: redirect header) he should make sure that he outputs the equivalent of:

      use CGI; use URI::Escape; # here put code that prints out the page header, etc. my $secondurl = 'http://www.myothersite.com/myotherwebapp2/foo.asp?p +aram=1&param=3'; my $initialurl = 'http://www.mysite.com/mywebapp1/dosomething?' . 'u +rl=' . uri_escape($secondurl); print '<a href="', CGI::escapeHTML($initialurl), '">launch mywebapp</a>'; # code that does that does the page footer

      In fact, I have a few times used something like this when formatting HTML output:

      sub queryToHTML { my ($uri, %param) = @_; my ($sepchar) = '?'; if (!%param) { $sepchar = ''; } elsif ($uri =~ /\?/) { $sepchar = '&'; } return CGI::escapeHTML( $uri . $sepchar . join '&', map {uri_escape($_) . '=' . uri_escape($param{$_})} keys(%param) ); }

      If you can guarantee that your queries are going to and from web frameworks that understand ';' as a separator (like, for example, any vaguely modern CGI.pm), you can replace the references to '&' with ';' - the advantage of doing that is that the output html looks less ugly.

Re: encoding URLs in URLs
by Anonymous Monk on Apr 19, 2004 at 20:22 UTC
    If one seeks to pass a full URL to a web app via a parameter in url, what characters must be escaped to ensure the param URL makes it across intact?
    It's called uri-encoding and is used in CGI. You can use the CGI module or URI::Escape to encode query string. To learn how CGI works read Ovids tutorial.
Re: encoding URLs in URLs
by Ryszard (Priest) on Apr 20, 2004 at 07:52 UTC
    Co-incidently I just had this problem two days ago. I used a quick unpack("H*", $url) to "escape" everything. The corresponding pack("H*", %q->param('url') ) brings its back to a usable form.

    Works well for me, and i can put the code in a base module to make a "transparent url munging" feature.. ;-)(albeit with a performance hit)

    Update: whoops %q->param('url') = $q->param('url')

Re: encoding URLs in URLs
by asarih (Hermit) on Apr 20, 2004 at 10:27 UTC
    I'm guessing that RTF 2396 gives you the complete list. The section 2.2 gives you this list:
    reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
Re: encoding URLs in URLs
by Roy Johnson (Monsignor) on Apr 19, 2004 at 19:59 UTC
      What about URI::Escape? It uses % hex codes, not html entities.

        I suspect that URI::Escape is the right choice. Here's a bit of code to compare them:
        my $url = 'http://www.myothersite.com/myotherwebapp2/foo.asp?param=1&p +aram=3'; use CGI; print "Using CGI Escape:\n"; my $esc_url = CGI::escapeHTML $url; print "$url\nbecomes\n$esc_url\n\n"; use URI::Escape; print "Using URI Escape:\n"; $esc_url = uri_escape($url); print "$url\nbecomes\n$esc_url\n";
        Output is:
        Using CGI Escape: http://www.myothersite.com/myotherwebapp2/foo.asp?param=1&param=3 becomes http://www.myothersite.com/myotherwebapp2/foo.asp?param=1&amp;param=3 Using URI Escape: http://www.myothersite.com/myotherwebapp2/foo.asp?param=1&param=3 becomes http%3A%2F%2Fwww.myothersite.com%2Fmyotherwebapp2%2Ffoo.asp%3Fparam%3D +1%26param%3D3

        The PerlMonk tr/// Advocate