Re: How to process two files of over a million lines for changes

Unix has a nice utility called "comm" that is perfectly suited for your needs.
It takes 2 sorted input files and returns the output in 3 columns:

column 1: lines unique to input file 1
column 2: lines unique to input file 2
column 3: lines common to both files

You can read the manpage for it, but this is how you would use it:

comm -3 yesterday.file today.file > difs.txt

The -3 switch turns off the output of column 3 (which you don't need).
The file difs.txt now contains 2 columns of data: The first column is the records unique to yesterday. The second column is the records unique to today. This file can easily be parsed to separate the 2 columns.

The records in column 1 are the ones that need to be deleted from the database.
The records in column 2 are the ones that need to be added to the database.

hope this helps,
davidj

Comment on Re: How to process two files of over a million lines for changes