Whether you have your million records in memory (fast) or on disk in a database (slow), you have to take the time to insert your new data. Looking up existing data is different - as explained, looking up in a hash is O(1): you take the key, perform a calculation on it (which is dependant on the length of the key, not the size of the hash), and go to that entry in the (associative) array. Looking up in a database cannot be any faster than O(1). It can be as bad as O(log N) (I can't imagine any database doing an index lookup any slower than a binary search), which is dependant on the number of data points you're comparing to.
The only way that a database could be faster is if it's a big honkin' box with lots of RAM, and that's a different box from your perl client.
This problem is one of the primary reasons to use a hash. (Not the only one, but one of them nonetheless.)
In reply to Re^3: How to remove duplicates from a large set of keys
by Tanktalus
in thread How to remove duplicates from a large set of keys
by nite_man
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |