in reply to Bloom Filter or other mehod to store URL's?

So, you have 10 million web pages, but you don't have 10 Gb of disk space to spare? I dunno how much you are costing your company, but I'd be surprised you can come up with a good enough Bloom filter in such a short time that it costs less to implement that, than it costs to buy an extra disk. (It's just going to be scratch space, doesn't need to backup, so the costs of the extra disk space aren't much more than just the disk).
  • Comment on Re: Bloom Filter or other mehod to store URL's?

Replies are listed 'Best First'.
Re^2: Bloom Filter or other mehod to store URL's?
by Jaap (Curate) on Apr 14, 2005 at 16:16 UTC
    I know i should not feed the trolls, but i cannot resist so here it comes:
    You assume the following:
    • The solution that takes 10GB is done in 0 time or time much less than the Bloom Filter
    • I do this for a boss who pays me to do it and wants me to work as cheap as possible
    • It will take a lot of time to make the Bloom Filter solution
    These assumptions are incorrect because of the following:
    • Typing use DBI; takes about as much time as typing use Bloom::Filter
    • As stated in the OP, i do this for fun & excercise
    • using Bloom::Filter, implementing this will be a snap