in reply to statistics of a large text

Always remember this:   virtual memory is “a disk file.”   The real-memory RAM acts as a very excellent and intelligent “buffer,” but it only works because of that peculiar property known as http://en.wikipedia.org/wiki/Locality_of_reference.

The appropriate method for you to use here is ... writing to files, sorting those files, and comparing the sorted streams.   It worked beautifully in COBOL (and even for punched-cards before computers were invented), and it produces predictable performance for arbitrary quantities of data.

When you said, in one sentence, “gigabytes of” and “memory,” I stopped reading ... as did everyone else.   It was not necessary to know the details.