in reply to How good is gzip data as digest?

If you use the binary MD5 as your keys, you can expect around a 33% saving in the size of the memory requirements. Assuming ascii keys. Maybe more if utf.

Ignore the "two CRCs" version. I'm not sure about the reliability of the collisions and it doesn't buy you a lot of space. The below is using a 64-bit Perl, so YMWV if you're using 32-bit.

#! perl -sw use strict; use 5.010; use Digest::MD5 qw[ md5 ]; use String::CRC32; use Devel::Size qw[ total_size ]; open IN, '<', 'randStr-1M(64-254).dat' or die $!; my %asc; chomp, ++$asc{ $_ } while <IN>; printf "%07d Ascii keys: %.f\n", scalar keys( %asc ), total_size( \%as +c ); undef %asc; seek IN, 0, 0; my %md5; chomp, undef( $md5{ md5( $_ ) } ) while <IN>; printf "%07d binary MD5 keys: %.f\n", scalar keys( %md5 ), total_size +( \%md5 ); undef %md5; seek IN, 0, 0; my %crc; chomp, undef( $crc{ pack 'VV', crc32( $_ ), crc32( scalar reverse $_ ) + } ) while <IN>; printf "%07d binary CRC keys: %.f\n", scalar keys( %crc ), total_size +( \%crc ); __END__ c:\test>bigHash.pl 1000000 Ascii keys: 53879053 1000000 binary MD5 keys: 35766510 1000000 binary CRC keys: 34756892

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."