in reply to Re: Re: Bloom::Filter Usage
in thread Bloom::Filter Usage

That sounds interesting!

I see following possibilities:
• If your DB happend to be Oracle you could load your 30 Million Recs with a pretty fast tool (sqlloader) and let the tool write the duplicate key records to a specified "discarded-records-file". You could in a second step walk through your exeptions only and eventually update your DB.
• Sort your file before you read sequentially through it. Sorting a biggie will take its time but afterwards you got all entries of one account grouped together. This would reduce your memory consumption.

pelagic