in reply to Re: Encoding/compress CGI GET parameters
in thread Encoding/compress CGI GET parameters

Hi tadman,

That's a pretty good summary, but it leaves my original question:

What is the best way to Crush (to use your sub name) CGI parameters? MIME:Base64 is not suitable because it increases the length of the URL, I want to decrease it. I think a solution would have to take advantage of the format of CGI parameters.

Something that just popped into my head:

A scheme as mentioned by Dave, but instead of storing the whole parameter string against a unique ID, store the parameter names, order and format (string/integer). Then encode the URL as the ID, followed by the parameter values encoded depending on format.

For example, if the hash contains

Key: 1 Value: Action=<string>,Area=<int>,SubArea=<int>

Then the URL:

http://www.server.com/cgi-bin/script/script.pl?Action=view&Area=12345& +SubArea=12345

Could be encoded to:

http://www.server.com/cgi-bin/script/script.pl/SKLJSD

where "SKLJSD" can be decoded to 1,view,12345,12345

Comments? This avoids the problem of having to expire hash entries, because the hash contains only formats, which are likely to be a fairly small set.

-- Michael Snell
-- michael@snell.com

Replies are listed 'Best First'.
Re^3: Encoding/compress CGI GET parameters
by tadman (Prior) on Jan 18, 2001 at 12:17 UTC
    If you're feeling ambitious, which it sounds like you are, you can always compact your data before sending it. Consider using pack() on your data to reduce the size, and then possibly MIME encoding it to handle the encoding for the URL. Base64 is good for your application since it is fully e-mail compatible.

    UTF-5 is also a possibility, and it is used to "encode" UNICODE for DNS purposes, mapping two-byte characters into the very limited DNS space A-Za-z0-9-. Fortunately, there is a little more "bitwidth" in the URL specification, something that could be better exploited with careful analysis and testing.

    Instead of having a parameter like "mode=view" or "mode=edit", consider using an ENUM() type parameter, where you have a table of modes and their associated "tiny" values. As long as you have a small number of variations, there is no need to report the entire thing verbatim. A single byte can carry a lot of information, as long as the context of this byte is understood.
    my (@possible_values) = qw(view edit modify delete nuke); my (%possible_values) = do { my $n; map { $_, $n++ } @possible_values; + }; $encoded_param = $possible_values{'mode'}; $decoded_param = $possible_values[$encoded_param];


    Numbers, likewise, can be squished into "packed binary" which can reduce 10-digit numbers into 4-byte values, or about 6-bytes after Base64, which is a moderate but valuable decrease.

    Here's a compactor that I just sketched out. Use for entertainment purposes only, as it is untested. It takes in a SCALAR and returns a squished up version with a type identification byte which can be used to desquish it properly later.
    sub Squish { my ($what) = @_; if ($what =~ /^\-?[0-9]+$/) { if ($what >= 0) { if ($what <= 255) { return pack ("CC", 0x01, $what); } elsif ($what <= 65535) { return pack ("CS", 0x02, $what); } elsif ($what <= 4294967295) { return pack ("CL", 0x04, $what); } } elsif ($what >= -128 && $what <= 127) { return pack ("Cc", 0x09, $what); } elsif ($what >= -32768 && $what <= 32767) { return pack ("Cs", 0x0A, $what); } elsif ($what >= 2147483648 && $what <= 2147483647) { return pack ("Cl", 0x0B, $what); } } elsif ($what =~ /^\-?[0-9]+(?:\.[0-9]+)?(?:e[\+\-]\d+)?$/) { return pack ("Cd", 0x0C, $what); } elsif (length ($what) < 16) { return pack ("C", 0x0C & (length($what) << 4)).$what; } elsif (length ($what) <= 255) { return pack ("CC", 0x0D, length($what)).$what; } return pack ("CS", 0x0E, length($what)).$what; }