in reply to statistics of a large text
Always remember this: virtual memory is “a disk file.” The real-memory RAM acts as a very excellent and intelligent “buffer,” but it only works because of that peculiar property known as http://en.wikipedia.org/wiki/Locality_of_reference.
The appropriate method for you to use here is ... writing to files, sorting those files, and comparing the sorted streams. It worked beautifully in COBOL (and even for punched-cards before computers were invented), and it produces predictable performance for arbitrary quantities of data.
When you said, in one sentence, “gigabytes of” and “memory,” I stopped reading ... as did everyone else. It was not necessary to know the details.