I have a need to compress and encode a string of characters, in a way that will be easy to reverse later. I'm not trying to obfuscate the string, just make it easier to manipulate later. Turning the string into a series of numerals would be ideal, but alphanumeric would also work.

The string I'm dealing with is a series of param() values coming back from an HTML form. One example of the string looks like this:

/h/plkr/3/www.plkr.org/rss.pl

This breaks down into:

/h = Scheme (http in this case) /plkr = Format (Output format) /3 = Fetch limit /www.plkr.org/rss.pl = Feed url

I tried IO::Zlib, Compress::Zlib, Digest::MD4 and Digest::MD5, and others... in the hopes that I could compress the string, then encode it, but it still gives me an ascii string that is longer than the original input (not enough redundant characters to make compression worthwhile).

Is there a way to do something like this? The shortest I can get the encoded string is 45 characters, with a 30-character input string, using this code:

my $string = '/h/plkr/3/www.plkr.org/rss.pl'; my ($type, $format, $limit, $feed) = (split '/', $string, 5)[1..4]; my @tokens = split('/', $string); my $compressed = compress($string) ; my $encoded = encode_base64($compressed);

The reason why I'm trying to do this, is so that I can present this url as a value in a URL passed into my application later on, such as: index.pl?eJzTz9AvyMku0jfWLy8v1wMx9fKL0vWLiouBHACWdQpk

I'm trying to make it easier for the user to use and bookmark these URLs for later use with my application. I'm storing the unique URL in a database, but I can't store the depth and output format in the db, because hundreds of users could use the same feed url, but apply different depths or output formats to it, and these are random/anonymous users, so I can't store this in a user table in the db.

I could build a set of hashes that have numeric lookups for output format and scheme, but that doesn't really gain me much in terms of making the value sitting in the URI field any smaller.

Any useful hints or tips on how I can optimize this further?


In reply to Compressing a string, but leaving it as a string by hacker

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.