in reply to Bloom Filter or other mehod to store URL's?

If you use the (binary) md5 of the url as the hash key and don't assign anything as the value (ie. use  undef $hash{ md5( $url ) }; to autovivify the key), then storing 10 million urls will require around 1 GB of ram.

If you preallocated enough buckets (keys %hash =  2 **24;), then it runs pretty quickly too.

There is the rare possibility that you will get a false positive by finding two urls that hash to the same md5, but the chances are less than with a bloom filter and if you are using md5 for your Bloom::Filter solution, that would be possible anyway.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.
Rule 1 has a caveat! -- Who broke the cabal?