You can get something closer to 30% compression using a hoffman coding.

Code at http://www.brettsbsd.net/~estrabd/SCHOOL/CS638/Huffman.pm.

Using only 4 urls, as example input, I was able to get between 34% and 40% compression.

use strict; use lib qw(./); use Huffman qw(:All); use Data::Dumper; my @word = ( 'http://www.perlmonks.org/index.pl?parent=43933;node_id=3333', 'http://www.coasthome.com/things.html', 'http://validator.w3.org/check', 'http://www.google.com/search?q=emdash+html&sourceid=mozilla-search&st +art=&start=&ie=utf-8&oe=utF-8&client=firefox&rls=org.mozilla:en-US:un +official', 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ', ); # build tree my @paths = build_encodings(join('', @word)); my $path_hash = build_encodings_hash(join('', @word)); my $tree = build_tree(join('', @word)); use Data::Dumper; print Dumper $path_hash; for my $string (qw ( http://lawyers.findlaw.com/lawyer/firm/Estate-Planning/Fort-Br +agg/North-Carolina http://www.seacrestproperties.com/ http://www.fortbragg.com/businesses_all.lasso ) ) { my $encoded_str; my $str; foreach my $item (split('', $string)) { if (my $ele = $path_hash->{$item}) { $encoded_str .= $ele->{ENCODING}; } else { die "No encoding for $item\n"; } } print "String: $string\n"; print "Encoded:\n$encoded_str\n"; # converting this to a bit vector is left as a exercise. $encoded_str = $encoded_str; my $decoded = decode_str($encoded_str,@paths); print "Decoded:\n".$decoded."\n"; print "Stats: \n"; print "Original chars: ".length($string)."\n"; print "Original size (in bits): ".(8*length($string))."\n"; print "Encoded size (in bits): ".length($encoded_str)."\n"; print "Size reduction: %".((1-(length($encoded_str)/(8*length($str +ing))))*100)."\n"; }
String: http://lawyers.findlaw.com/lawyer/firm/Estate-Planning/Fort-Bragg/North-Carolina
Encoded:
110111...0100101111100
Decoded:
http://lawyers.findlaw.com/lawyer/firm/Estate-Planning/Fort-Bragg/North-Carolina
Stats: 
Original chars: 80
Original size (in bits): 640
Encoded size (in bits): 420
Size reduction: %34.375
String: http://www.seacrestproperties.com/
Encoded:
1101110...1011000
Decoded:
http://www.seacrestproperties.com/
Stats: 
Original chars: 34
Original size (in bits): 272
Encoded size (in bits): 161
Size reduction: %40.8088235294118
String: http://www.fortbragg.com/businesses_all.lasso
Encoded:
1101110...0001100010110
Decoded:
http://www.fortDragg.com/Dusinesses_all.lasso
Stats: 
Original chars: 45
Original size (in bits): 360
Encoded size (in bits): 231
Size reduction: %35.8333333333333

A picture is worth a thousand words, but takes 200K.

In reply to Re: Efficient 7bit compression by gam3
in thread Efficient 7bit compression by Limbic~Region

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.