If the files are in the proper order (sorted by email address, if I remember correctly), you can use a file merge type solution.
- Open your output file (O)
- Open your input file (I), and your deletion file (D)
- Read first record of I and D
- while I is not at end of file
- read next record from D while D < I and D is not at the end of file
- send I -> O unless I == D
- close all files
Conversion into perl is left as an exercise to the reader.
sort (OS level), sort (Perl level), open, close, eof, and perlop are all potentially helpful in this task.
Be aware that you are dealing with 1 billion records, so it is likely, depending on the complexity of the records and comparison, that the sort or filter step could take a while.
Benefits: only one record from each of the input and deletion files is in memory at a time.
|