in reply to Re: How good is gzip data as digest?
in thread How good is gzip data as digest?

A "bidirectional transformation" - exactly what I thought! But I did not think about that compressing of short strings might not buy me much! So thank you for that!
In regards to using a digest: I think I will have to benchmark some possible solutions (sigh) and measure how well compression on my type/length of strings actually is. I will post an update with my results if I find the time to do that... Anyway, thank you all for the quick thoughts!

Update:
It seems my strings reach a break-even on lengths of around 120. Strings shorter than that are compressed actually longer than the original. (Exactly what Fletch observed) I think I will have to look somewhere else for a gain...

Replies are listed 'Best First'.
Re^3: How good is gzip data as digest?
by roboticus (Chancellor) on Apr 02, 2009 at 20:26 UTC
    isync:

    If you're currently using an ASCII version of an MD5 digest, you could easily compress 1/3 of the space out by converting it to binary and then using BASE64 to convert to a string. (Or, even better, do it directly.) The ASCII version uses a byte to hold 4 bits of information, while BASE64 will store 6 bits of info in a byte. If you don't mind using a binary string, that would let you compress to 50% of the original size.

    Other than that, I wouldn't expect much. Compression routines (at least the ones I'm familiar with) simply squash repetitive patterns out of the data. A digest tends to spread values evenly through the result space, so there shouldn't be much repetition for a compression algorithm to take advantage of.

    ...roboticus