If you use the binary MD5 as your keys, you can expect around a 33% saving in the size of the memory requirements. Assuming ascii keys. Maybe more if utf.

Ignore the "two CRCs" version. I'm not sure about the reliability of the collisions and it doesn't buy you a lot of space. The below is using a 64-bit Perl, so YMWV if you're using 32-bit.

#! perl -sw use strict; use 5.010; use Digest::MD5 qw[ md5 ]; use String::CRC32; use Devel::Size qw[ total_size ]; open IN, '<', 'randStr-1M(64-254).dat' or die $!; my %asc; chomp, ++$asc{ $_ } while <IN>; printf "%07d Ascii keys: %.f\n", scalar keys( %asc ), total_size( \%as +c ); undef %asc; seek IN, 0, 0; my %md5; chomp, undef( $md5{ md5( $_ ) } ) while <IN>; printf "%07d binary MD5 keys: %.f\n", scalar keys( %md5 ), total_size +( \%md5 ); undef %md5; seek IN, 0, 0; my %crc; chomp, undef( $crc{ pack 'VV', crc32( $_ ), crc32( scalar reverse $_ ) + } ) while <IN>; printf "%07d binary CRC keys: %.f\n", scalar keys( %crc ), total_size +( \%crc ); __END__ c:\test>bigHash.pl 1000000 Ascii keys: 53879053 1000000 binary MD5 keys: 35766510 1000000 binary CRC keys: 34756892

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re: How good is gzip data as digest? by BrowserUk
in thread How good is gzip data as digest? by isync

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.