If you're already storing the unique URL in a database, why not use a unique key from the db for the URL portion? That would let you turn
my $string = '/h/plkr/3/www.plkr.org/rss.pl';
into
my $string = '/h/plkr/3/426933';
where 426933 would be the unique ID for retrieving that URI from the database. It has the added bonus of letting the end user tweak things like depth and output format without having to decode/re-encode the string.

If you really want to encode the string as-is, without having to store anything in a DB, you might be able to use something like Algorithm::Huffman in CPAN to generate a short bit vector for your data, and then use some other method (encode_base64 or somesuch) to make it safe for use in a URL. You'll need to grab a good sample of the sort of strings you want to encode, and do a frequency count on them so that Algorithm::Huffman can build up a good dictionary. Here's an untested example:

my $token_counts = { '/' => 10000, '.' => 3000, 'w' => 2342, 'e' => 2000, 't' => 2000, # ... etc, for any character that might appear in the input strin +g }; my $string = '/h/plkr/3/www.plkr.org/rss.pl'; my $huff = Algorithm::Huffman->new($token_counts); my $huff_encoded = $huff->encode($string); my $printable = encode_base64($huff_encoded);
I have no idea what sort of size savings this will get you, though. Base-64 encoding costs you one character of output for every 6 bits of input, so as long as your Huffman encoding saves you more bits than Base 64 encoding wastes, you should end up with a URL-safe encoding that's shorter than the input.

In reply to Database lookup, or Huffman coding by dave0
in thread Compressing a string, but leaving it as a string by hacker

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.