Re: efficiency of exists()

If/when you get to a point of having many millions of keys in your database, the start-up time to load all of them into memory for your "fast check", and (eventually) the memory consumption for the hash itself, could put you beyond a point of diminishing returns.

Depending on how big the DB table gets, how many look-ups you actually do in one run of your script, and what else is going on besides look-ups, you might cross a threshold where the script runs faster if you just do queries to check for key values, rather than loading all keys into a hash for that purpose.

BTW, if the reason for checking the existence of a key is to decide whether you should do an insert vs. an update (or insert vs. nothing), you might want to check out the "INSERT ... ON DUPLICATE KEY UPDATE ..." syntax in mysql.

In any case, if speed is really an issue, you'll want to have a benchmark for testing the alternatives. Use Benchmark if you like, or just have two versions of a job that will do a fair test for both approaches. You'll want it to be able to compare the timing now, and also make an equivalent comparison at any time in the future, to see if table size affects one approach differently from the other.

Comment on Re: efficiency of exists()