in reply to file delta detection
So to provide a reasonably general solution, that should work for most machines, I would do quite a lot with unix power tools, starting with:
The xk.sor file has the keys common to both files. t.sor (key column now being appended on the front) contains the mixture of records destined to be either T or U (must be one or the other) and d.sor the mixture of D and U records in the older file (or rather file of older data of the two input files to this process - also now with key appended to front).awk 'BEGIN{ FS="'" } { print $3 "|" $0 }' < file1 | sort > mod1.sor awk 'BEGIN{ FS="'" } { print $3 }' < file1 | sort > keys1.sor awk 'BEGIN{ FS="'" } { print $3 "|" $0 }' < file2 | sort > mod2.sor awk 'BEGIN{ FS="'" } { print $3 }' < file2 | sort > keys2.sor comm -12 keys1.sor keys2.sor > xk.sor comm -13 mod1.sor mod2.sor > t.sor comm -23 mod1.sor mod2.sor > d.sor
Now you can load xk.sor into Perl as hash keys to identify the 'U' records in t.sor (all others being the T). The xk.sor hash can be used similarly to eliminate the 'U' records from d.sor (all the others being the D. You can remove the key column we appended on the front at output time.
In regard to Perl language elements needed: hashes to hold the keys, split function to split delimited records into arrays. shift to remove the first element from an array. open to open files for input or output and the <> operator to read from files. Need anything more for this? (update: apart from print to output the lines with their appendages, minus the key-on-the-front which we used to make unix sort work without difficulty - unix sort has a key definition possibility but this is too awkward with delimiters - but unix sort has built-in disk-swapping facilities for huge file processing)
One world, one people
|
|---|