in reply to Comparing text documents

I am going to assume the documents differ wildly, that you have excel sheets, html files, pdfs, images, simple text documents.

I would suggest possibly, and this is a hack.. To first weed out by much less specific and cpu intense methods.. How about:

Like I said, this is a total hack, overall- if all your documents *were* similar, this would greatly slow down the whole process. However if some of these kinds of simple conditions *can* be deemed authoritative with your document archive, then it could be what you need to do.