asafp has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have a CGI program which gets a URL looks like this:

http://example.mycgi.com:9999/service?func=search&institute=DEMO&calling_app=ABC&url=http://return.link.com:8997/F?local_base=admin&func=save&base=staff

All what comes after the "&url=" is a backlink URL which I need to find.
Now, I am trying to parse the part from the "&url=" to the end but cannot find a way of doing it.
When using the "query_string" function, I get the following string:

func=search;func=save;institute=DEMO;calling_app=ABC;url=http%3A%2F%2Freturn.link.com%3A8997%2FF%3Flocal_base%3Dadmin;base=staff

Notice that the "func=save" is moved from its place to be after the "func=search", and now I cannot find it when I create the back URL.

Is there a way to simply get the full URL as is and parse it by myself?

Thanks

Replies are listed 'Best First'.
Re: How to parse URL in CGI.pm
by kennethk (Abbot) on Dec 05, 2010 at 16:02 UTC
    If local_base=admin&func=save&base=staff are supposed to be part of the url parameter, then the source of the problem is that you are dealing with a bad URL - see Percent_encoding. Again, that is not a legal URL. If a parameter must contain any of the reserved characters (in this case, it contains 5 different reserved characters), you must escape it. See URI::Escape. Are you generating this link in a different script, or is it coming from the outside? This should really be fixed where it is coming from - the URL should look like: http://example.mycgi.com:9999/service?func=search&institute=DEMO&calling_app=ABC&url=http%3A%2F%2Freturn.link.com%3A8997%2FF%3Flocal_base%3Dadmin%26func%3Dsave%26base%3Dstaff

    If this is coming from the outside and there is no way to get them to fix their code, you can check the environmental variables for the request URL: $ENV{REQUEST_URI}.

    For some intro material on working with Perl CGI, check out Ovid's CGI Course - Resurrected and Updated!. In particular, part of your issue is discussed in lesson 2.

      that is not a legal URL

      You are mistaken. Not only is the url legal, it is parsed identically whether those characters are escaped or not. Only "#" must be escaped in the query component of HTTP urls since no other character "would conflict with a reserved character's purpose as a delimiter" in that part of the url. Other limitations are self-imposed.

      Where it makes a difference is how the query is parsed. In this case, "?" and ";" must be escaped in addition to "#" because CGI (the module) expects the query to be a url-encoded form (application/x-www-form-urlencoded) with the extension that ";" is equivalent to "?". (It also supports ISINDEX-style queries.)

      If he did his own query parsing, all that comes after the "&url=" could be considered part of the backlink url. But since he's using CGI's parser, all that comes after the "&url=" but only until the next "&" and ";" is considered part of the backlink url.

      Thanks for the help!

      The URL is coming from outside and I cannot ask the sender to encode it.

      So, I think I'll use the ENV{REQUEST_URI} option.

      Is this safe? Can I count on it to always give me the correct URI?

        Is this safe? Can I count on it to always give me the correct URI?

        Its safe in the sense that it is only data.

        Since REQUEST_URI is not part CGI spec, it won't be available on every server, so its better to rely on other variables (PATH_INFO/QUERY_STRING...)