in reply to Re: How to remove duplicates from a large set of keys
in thread How to remove duplicates from a large set of keys
With a million keys, you should go for the database.
Why? The OP was concerned with speed.
I see this as another (merlyn-style) "bad meme". There are plenty of very good reasons for using a database, but *speed* is not one of them!
Using DB_file is takes over 5 minutes to do what this code does in under 10 seconds. And that is once you've worked out how. The sample code from DB_File does not even compile as printed.
It may be possible to improve that 5 minutes, if you hunt the internet to locate, read, and understand the Berkeley DB optimisation and configuration advice, but you'll never get near direct file access for performance in this application.
|
|---|