monsieur_champs has asked for the wisdom of the Perl Monks concerning the following question:

Fellows

I was playing with pack() together with Compress::Zlib some time ago, so I could gen some compressed eval()-uable perl code for a programming toy.

While I think about the problem, I figured out that I need to represent the databytes generated from Compress::Zlib as a hexadecimal sequence of digits. I figured how to do this, but can't get the results back. Reading the manual, I wasn't able to determine what I'm doing The Wrong Way(tm). Below follows my code:

use Compress::Zlib; $code = q#print "Here I will put some nice code to be uncompressed and + eval()ed latter";#; $code = unpack 'H*', Compress::Zlib::memGzip( $code ); print pack 'H*', Compress::Zlib::memGunzip( $code ); # Here I get something very strange...

I'm almost sure that I'm doing something very stupid, but I can't see this now. Can anybody please point me a easy way to code this?

As a side-question, I would like to know your opinion about this: is this technique a good way to reduce the communications bandwidth needed to send Large Programs(tm) trought a highly used, low bandwidth serial communications channel between two machines? What you would do in my place?

Please note that this is just a sketch of an initial approach, not a complete solution for the problem.

May the gods bless you.

UPDATE: No, this is not homework! It's just a toy for my low cost service serial point-to-point home-made network laboratory.


"In few words, translating PerlMonks documentation and best articles to other languages is like building a bridge to join other Perl communities into PerlMonks family. This makes the family bigger, the knowledge greather, the parties better and the life easier." -- monsieur_champs

Replies are listed 'Best First'.
Re: reversible pack()?
by belg4mit (Prior) on Feb 10, 2004 at 21:25 UTC
    Order of operations Zip => Pack => Unpack => Unzip

    --
    I'm not belgian but I play one on TV.

Re: reversible pack()?
by tachyon (Chancellor) on Feb 10, 2004 at 22:01 UTC

    If you convert ASCII text with say 80 used chars into gzipped hex where you effectively drop yourself down to a 16 char alphabet but still in the (extended) ASCII 8 bit space you need 5x compression just to break even. This is close to the max compression you expect with gzip on text so this is kinda pointless. Normally you pack your ~ 80 char alphabet into a 256 slot 8 bit binary space. This is where a significant part of the compression comes from - using all the available bits efficiently. Hex is not the go.

    [root@devel3 root]# cat test.pl #!/usr/bin/perl use Compress::Zlib; my $str = "Hello World!"; my $gzip = Compress::Zlib::memGzip( $str ); my $hex_enc = unpack 'H*', $gzip; my $hex_dec_gzip = pack 'H*', $hex_enc; my $str_dec = Compress::Zlib::memGunzip( $hex_dec_gzip ); print " $str $hex_enc $str_dec "; [root@devel3 root]# ./test.pl Hello World! 1f8b0800000000000003f348cdc9c95708cf2fca49510400a31c291c0c000000 Hello World! [root@devel3 root]#

    cheers

    tachyon

      If you convert ASCII text with say 80 used chars into gzipped hex where you effectively drop yourself down to a 16 char alphabet but still in the (extended) ASCII 8 bit space you need 5x compression just to break even.

      If the reasoning behind the hex representation is to make it "safe" to treat this data in certain ways (such as transmitting on usenet) that might lose control characters and anything over seven bits, then it's possible to do better by using a base larger than 16, but smaller than 256. For example, if you use base 64 with 33 added to each digit when mapping it back into ASCII, you get all nice safe printable characters but manage to store six usable bits in every byte, which is not altogether bad. If you can get better than 25% compression, you'll have a net gain (though perhaps not a large one). On English text of any significant size, better than 25% gain is very achievable with a simple Huffman tree, much less gzip.


      $;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/

      So I need a better approach to achieve compression enought to get trought the break-even. Any suggestions? Maybe I should leave the gzipped string as is, and hope no distortion can happen while editing the uncompressed part of the program?

      Can you point me some nice website or good book about this matter?

      Thank you very much again!


      "In few words, translating PerlMonks documentation and best articles to other languages is like building a bridge to join other Perl communities into PerlMonks family. This makes the family bigger, the knowledge greater, the parties better and the life easier." -- monsieur_champs

        gzip first then base64 encode. To decode undo in opposite order ie base64_decode then gunzip. Still a lot fatter than binary but a lot thinner than hex.

        If you want to know something try Google. There are hundreds of websites dealing with compression. Try say 'tutorial data compression theory' and find sites like......why don't you have a look yourself.

        It is trivial to test this. Just compress, encode a representative string, and check it for LENGTH. Repeat with another encoding. Compare to length of gzip string and you will see how much you are loosing.

        use Compress::Zlib; use MIME::Base64; my $str = "Hello World! " x 3; my $gzip = Compress::Zlib::memGzip( $str ); my $hex = unpack 'H*', $gzip; my $base64 = encode_base64('Aladdin:open sesame'); my $str_len = length($str); my $gzip_len = length($gzip); my $hex_len = length($hex); my $base64_len = length($base64); # make binary printable ;-) $gzip = '#' x $gzip_len; printf "%3d: %s\n%3d: %s\n%3d: %s\n%3d: %s\n", $str_len, $str, $gzip_len, $gzip, $hex_len, $hex, $base64_len, $ba +se64; __DATA__ 39: Hello World! Hello World! Hello World! 36: #################################### 72: 1f8b0800000000000003f348cdc9c95708cf2fca495154f0c0c90100b9a8ae382 +7000000 29: QWxhZGRpbjpvcGVuIHNlc2FtZQ==

        You will see the value of compression as you increaase the string length.

        cheers

        tachyon

Re: reversible pack()?
by bmann (Priest) on Feb 10, 2004 at 21:58 UTC
    Your code tries to memGunzip $code before packing it. pack it before uncompressing it. Replace your final line of code like so...

    print Compress::Zlib::memGunzip( pack 'H*', $code );

    and you are unzipping the packed $code.

    HTH

Re: reversible pack()?
by elwarren (Priest) on Feb 11, 2004 at 07:02 UTC
    huh, learn something around here everyday. I always suspected that's how gzip compression worked, never bothered to learn. Thanks perlmonks! So I wonder where the break even point is? His example is short, but he refers to moving Large Programs(tm) around.

    I thought there was an Apache module that compressed URL parameters with gzip and hashed the output and I remember thinking that the strings were so short it seemed like it's only use was obfuscation. Poking around CPAN there are so many Apache modules now (vs then) that I got sidetracked looking at PerlIO::gzip. Take a look, maybe it'll save you some time.

    Depending on that serial link you might think about using PPP compression. PPP is too much overhead (but you did say the link was busy) so you might be running straight comms. What was the compression built into the modems called, v42?

    I'm sure you've already explored XON/XOFF vs RTS/CTS to save a few bits. 7bit vs 8bit to save on checksums. Oh, checksums, yuck, lose a bit on a noisy serial link and your data won't decompress, but then again your code wouldn't work (correctly) either.

    You could send the compiled bytecode. Too much work for too little benefit. Guess I've been reading too much java serialization late at night. Maybe you could run it through Acme::Morse or something that generates code that would compress better?