Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Encoding/compress CGI GET parameters

by snellm (Monk)
on Jan 17, 2001 at 17:19 UTC ( [id://52504]=perlquestion: print w/replies, xml ) Need Help??

snellm has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I frequently write systems that generate URL which are sent by email, for example:

http://www.server.com/cgi-bin/script/script.pl?Param1=Value&Param2=val +ue&Param3=value

I would like to alter this to:

http://www.server.com/cgi-bin/script/script.pl/dflskjdfsdkf

where "dflskjdfsdkf" is the parameters encoded in some form. The advantage of doing this are:

- The URL is shorter (and therefore less likely to be split into two lines by an email package)
- The user isn't tempted to try modifying the parameters
- The URL can be easily checksummed

Any thoughts as to the best way to go about this?

PS: "pack" and "md5" come to mind.

regards,

-- Michael Snell
-- michael@snell.com

Replies are listed 'Best First'.
Re: Encoding/compress CGI GET parameters
by tadman (Prior) on Jan 17, 2001 at 20:16 UTC
    Encoded URL

    First, define two routines that can do the grunt work of crushing and uncrushing your parameters, which for the sake of argument are being stored in a HASH:
    use MIME::Base64; sub Crush { return shift(@_).MIME::Base64::encode (join ("\x00", @_), ""); } sub Uncrush { my ($q) = shift; return split (/\x00/, MIME::Base64::decode($q->path_info())); }
    This is, of course, assuming you don't have any NULL (ASCII 0) characters in your data. If you do, this code will break, but it will work fine on normal ASCII text. Additionally, it uses "path_info()" from CGI.pm, which you should be using anyway.

    You could tie them into your program like so:
    my (%data) = ( 'x' => 'y' ); # etc. $url = Crush ("http://www.xyzco.com/foo.cgi?",%data); # Or, on the receiving end... my ($q) = new CGI; my (%data) = Uncrush ($q);
    It all ends up as a great big pile of goo as far as the user is concerned, but it isn't encrypted to any great degree. If you wanted, you could MD5 encrypt it, PGP it, or whatever strikes your fancy, before MIME::Base64::encode(), with the opposite on the receiving end, of couse.

    Server Side Data

    An intelligent alternative to this "encoding" is to keep the data on the server. As you mentioned, long URLs are a problem for some e-mail programs, and certainly more users. To keep the URL to an absolute minimum, you could store all of the data in a database on the server side and pass only a key to the client.

    Basically, your URL would contain a text key like "AxZLkFlG" which is a randomly created string that the server would use to identify that session. You could then store all of your data server side.

    The downside to this approach is that the data has to be preserved for extended periods of time, because if the server data is "expired", the URL becomes virtually useless. If you expect the users to re-visit six or eight months from now, that would translate to a six or eight month history of data, which can get quite large, depending on your application.

    Additionally, if a user sends a copy of the URL to five friends, they will all be modifying the same database entry, which can lead to some unsavory variable "bleed" between their sessions. This can be very dangerous, especially for e-commerce applications.

    If you have no idea when the user is going to re-visit, and you want to preserve the state of the program indefinitely, you have to pack all the data into the URL. Base64 expands the content moderately, so the URLs will always be longer using this method, but this can be minimized if you compress it before encoding (i.e. LZW encoding, like that used in gzip).

      Hi tadman,

      That's a pretty good summary, but it leaves my original question:

      What is the best way to Crush (to use your sub name) CGI parameters? MIME:Base64 is not suitable because it increases the length of the URL, I want to decrease it. I think a solution would have to take advantage of the format of CGI parameters.

      Something that just popped into my head:

      A scheme as mentioned by Dave, but instead of storing the whole parameter string against a unique ID, store the parameter names, order and format (string/integer). Then encode the URL as the ID, followed by the parameter values encoded depending on format.

      For example, if the hash contains

      Key: 1 Value: Action=<string>,Area=<int>,SubArea=<int>

      Then the URL:

      http://www.server.com/cgi-bin/script/script.pl?Action=view&Area=12345& +SubArea=12345

      Could be encoded to:

      http://www.server.com/cgi-bin/script/script.pl/SKLJSD

      where "SKLJSD" can be decoded to 1,view,12345,12345

      Comments? This avoids the problem of having to expire hash entries, because the hash contains only formats, which are likely to be a fairly small set.

      -- Michael Snell
      -- michael@snell.com

        If you're feeling ambitious, which it sounds like you are, you can always compact your data before sending it. Consider using pack() on your data to reduce the size, and then possibly MIME encoding it to handle the encoding for the URL. Base64 is good for your application since it is fully e-mail compatible.

        UTF-5 is also a possibility, and it is used to "encode" UNICODE for DNS purposes, mapping two-byte characters into the very limited DNS space A-Za-z0-9-. Fortunately, there is a little more "bitwidth" in the URL specification, something that could be better exploited with careful analysis and testing.

        Instead of having a parameter like "mode=view" or "mode=edit", consider using an ENUM() type parameter, where you have a table of modes and their associated "tiny" values. As long as you have a small number of variations, there is no need to report the entire thing verbatim. A single byte can carry a lot of information, as long as the context of this byte is understood.
        my (@possible_values) = qw(view edit modify delete nuke); my (%possible_values) = do { my $n; map { $_, $n++ } @possible_values; + }; $encoded_param = $possible_values{'mode'}; $decoded_param = $possible_values[$encoded_param];


        Numbers, likewise, can be squished into "packed binary" which can reduce 10-digit numbers into 4-byte values, or about 6-bytes after Base64, which is a moderate but valuable decrease.

        Here's a compactor that I just sketched out. Use for entertainment purposes only, as it is untested. It takes in a SCALAR and returns a squished up version with a type identification byte which can be used to desquish it properly later.
        sub Squish { my ($what) = @_; if ($what =~ /^\-?[0-9]+$/) { if ($what >= 0) { if ($what <= 255) { return pack ("CC", 0x01, $what); } elsif ($what <= 65535) { return pack ("CS", 0x02, $what); } elsif ($what <= 4294967295) { return pack ("CL", 0x04, $what); } } elsif ($what >= -128 && $what <= 127) { return pack ("Cc", 0x09, $what); } elsif ($what >= -32768 && $what <= 32767) { return pack ("Cs", 0x0A, $what); } elsif ($what >= 2147483648 && $what <= 2147483647) { return pack ("Cl", 0x0B, $what); } } elsif ($what =~ /^\-?[0-9]+(?:\.[0-9]+)?(?:e[\+\-]\d+)?$/) { return pack ("Cd", 0x0C, $what); } elsif (length ($what) < 16) { return pack ("C", 0x0C & (length($what) << 4)).$what; } elsif (length ($what) <= 255) { return pack ("CC", 0x0D, length($what)).$what; } return pack ("CS", 0x0E, length($what)).$what; }
Re: Encoding/compress CGI GET parameters
by Beatnik (Parson) on Jan 17, 2001 at 18:15 UTC
    You can handle the /script.pl/dflskjdfsdkf part with $ENV{PATH_INFO} (if I'm not mistaking). Most pack() & unpack() operations will make it longer (hex 2 for 1, binary 8 for 1). CRC Modules and Compression Modules are listed on CPAN. MD5 is more about encryption (which is pretty usefull too, in this case).

    Greetz
    Beatnik
    ... Quidquid perl dictum sit, altum viditur.

      Hi Beatnik,

      My line of thought for integers was: There are approx 71 characters valid in a URI, which means any integer could be converted to base 71 and the resulting bytes remapped into that range of characters. With a stop byte, that means any integer greater than 5041 would save at least a byte.

      Having said that, I expect most compression to come by using a scheme where the first byte indicates the format of the data, meaning that parameter names are not neccesary.

      regards,

      -- Michael Snell
      -- michael@snell.com

Re: Encoding/compress CGI GET parameters
by davorg (Chancellor) on Jan 17, 2001 at 18:36 UTC

    Do you actually need to encode the parameters? Couldn't you just store the parameters on the server side and send a unique identifier to the client (like a session id)?

    --
    <http://www.dave.org.uk>

    "Perl makes the fun jobs fun
    and the boring jobs bearable" - me

      Hi Dave,

      As far as I can see, sessions per se would not work, because:

      - These URL's are typically sent in email messages (so the session is not initiated by logging into a site)
      - There is no one-to-one mapping from users to sessions (the same URL may be sent to many users, and users are not required to authenticate on the site)

      In an general sense, I suppose it would be possible to set up a persistent hash where the key is a unique ID and the value is the parameters. When generating the email, the parameters would be stripped off the URL, and stored in the hash against a new unique key. The key would then be added to the URL. When the CGI is run, it would look up the values against the key.

      My objection to this is it's not possible to tell when a given hash entry can be safely deleted.

      regards,

      -- Michael Snell
      -- michael@snell.com

        In an general sense, I suppose it would be possible to set up a persistent hash where the key is a unique ID and the value is the parameters. When generating the email, the parameters would be stripped off the URL, and stored in the hash against a new unique key. The key would then be added to the URL. When the CGI is run, it would look up the values against the key.

        That's pretty much what I envisaged, but why not have two hashes going both ways. The keys of the other hash are the params and the value is the unique id. Then when you generate the email look to see if the id for this combination of params already exists, and if it doesn't generate one and insert a record in both hashes

        My objection to this is it's not possible to tell when a given hash entry can be safely deleted.

        I think that under my scheme, as you're reusing the same id for the same combination of parameters you won't need to delete entries from the hashes.

        I may be misunderstanding the problem tho'.

        --
        <http://www.dave.org.uk>

        "Perl makes the fun jobs fun
        and the boring jobs bearable" - me

        You do not say how many options there are for each parameter. If there are only a small number of responses then the call could be replaced by a one or two character sequence. A number of these could be concatenated into a simple string.

        I'm a little confused at this point. I think davorg's idea is a pretty good one, as it makes it possible to pass around the session ID and reconstruct the data on the server side. I suppose you could also construct the session-ID using a two-way algorithm rather than MD5 (i.e. use a method you can decode with the proper key ... the RSA algorithm isn't patented any more =)

        What do you want to consider a "safe" deletion? After a certain time? (store an expiration date in the database for each hash) When a certain percentage of (unique) respondents have used the session ID? (Ditto, you update the dB for each unique respondent).

        Philosophy can be made out of anything. Or less -- Jerry A. Fodor

Re: Encoding/compress CGI GET parameters
by jdgamache (Novice) on Jan 17, 2001 at 19:44 UTC
    try URL forwarding.
    # read your parameters # do some stuff # forward to a url (dont forget the \n\n) print "Location: http://www.server.com/cgi-bin/script/script.pl/dflskj +dfsdkf\n\n";
Re: Encoding/compress CGI GET parameters
by em (Scribe) on Jan 18, 2001 at 04:10 UTC
    How about a non perl solution?

    A lot of mail clients will do the right thing if a URL is enclosed in < >.

    Meaning you email the URL as:
    <http://www.server.com/cgi-bin/script/script.plParam1=Value&Param2=value&Param3=value>

    If you still need to cut down the length of the URL, I would still look at just passing a session key with the URL and then doing a database lookup based on the key.

    Also, rather than building the key lookup into the main cgi script, I suggest that you write a generic redirect script that does the look up.

    Meaning, you send <http://www.server.com/cgi-bin/redir.pl?KKDIS47> and the script does a database look on KKDIS47 and then redirects to http://www.server.com/cgi-bin/script/script.plParam1=Value&Param2=value&Param3=value

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://52504]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-03-29 07:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found