in reply to Huge files manipulation

You need to do some sort of divide-and-conquer approach. For example, you could split up the data into a bunch of temporary files according to the first 2-3 fields, uniquify each of those files, then cat them back together into one big file.