Re: file delta detection

You don't state the capabilities of the machine you are running on or the width of the records. For example, if they are 10K wide, you would need a monstrous machine to load the files into Perl memory.

So to provide a reasonably general solution, that should work for most machines, I would do quite a lot with unix power tools, starting with:

awk 'BEGIN{ FS="'" } { print $3 "|" $0 }' < file1 | sort > mod1.sor
awk 'BEGIN{ FS="'" } { print $3 }' < file1 | sort > keys1.sor
awk 'BEGIN{ FS="'" } { print $3 "|" $0 }' < file2 | sort > mod2.sor
awk 'BEGIN{ FS="'" } { print $3 }' < file2 | sort > keys2.sor
comm -12 keys1.sor keys2.sor > xk.sor
comm -13 mod1.sor mod2.sor > t.sor
comm -23 mod1.sor mod2.sor > d.sor
[download]

The xk.sor file has the keys common to both files. t.sor (key column now being appended on the front) contains the mixture of records destined to be either T or U (must be one or the other) and d.sor the mixture of D and U records in the older file (or rather file of older data of the two input files to this process - also now with key appended to front).

Now you can load xk.sor into Perl as hash keys to identify the 'U' records in t.sor (all others being the T). The xk.sor hash can be used similarly to eliminate the 'U' records from d.sor (all the others being the D. You can remove the key column we appended on the front at output time.

In regard to Perl language elements needed: hashes to hold the keys, split function to split delimited records into arrays. shift to remove the first element from an array. open to open files for input or output and the <> operator to read from files. Need anything more for this? (update: apart from print to output the lines with their appendages, minus the key-on-the-front which we used to make unix sort work without difficulty - unix sort has a key definition possibility but this is too awkward with delimiters - but unix sort has built-in disk-swapping facilities for huge file processing)

One world, one people

Comment on Re: file delta detection Download Code