in reply to Help for finding duplicates in huge files

With only about 3 million unique id's, a memory resident hash table is completely feasible. An easy approach is to read all of the input data twice and output a subset of the input data once.

my %hash; open file1... while (<$file1>) { ... do something to get the id $hash{$id}++; } close file1 open file 2 while (<$file2>) { ... do something to get the id $hash{$id}+=3; } close file 2 open 3 output files: the both file, file1 only and file2 only loop back thru file1 and file2 for each line decide where it goes if hash value of the id ==1 file one only if hash value of the id ==3 file two only if hash value of the id ==4 both files negate the hash value to signal that this id has already been dealt with - each id should only appear in one of the 3 output files.
Update:
To just get the id's of the lines into the 3 different files, it is not necessary to read the input files the second time - just dump the hash according to the "rules". In the above, I first thought that each id would be associated with some non-trivial amount of data - that's not true. The above algorithm will run very fast - no sorting/merging is required.