in reply to Efficent Text Checking

Having the text itself as a key is probably very memory inefficient. Have you tried md5sum() on the text to make smaller keys. This will take extra time, but less memory.

Also, you could store your hash to disk if you repeatedly run this many times, and the files don't change very much. You'd just need to read-in the md5sums and filenames rather than check them again and again and again...