Hello All,
I am looking for a way to delete lines once I have read them I read them.
I have several large files which I want to combine, deleting any non-unique lines. I am using a hash to combine them for uniqueness but I do not have enough memory to put the whole thing into one hash (the files combined are over 100GB). The files are sorted so I just want to take out, say, the first million lines of each to combine them. After getting the lines I want to delete them and close the file.
Whats the best way to do this? thanks.