in reply to Verifying data in large number of textfiles

You could put the data in a database, adding a linenum and a filenum field if necessary. Then, all you'd have to do is:

- foreach line $linenum - Compare the number of records returned by "SELECT * WHERE LINENUM=$linenum" to the number of records returned by "SELECT DISTINCT * WHERE LINENUM=$linenum". If they're different, there are duplicate records. - end

The same approach can be taken without a database. It involves regrouping all the files so that line1.dat contains the first line of every original file, line2.dat contains the second line of every original file, etc. Pseudo-code:

- foreach original file - $linenum = 1; - while not eof - append the line to file "line${linenum}.dat" - $linenum++; - end - end - foreach line file - Compare num of lines returned by 'cat line###.dat | sort | uniq' with the number of lines in line###.dat. If they're different, there are duplicate records. - end

A completely different approach is to convert your CSV files to fixed-length field files. Then you can easily compare an arbitrary line in one file to the same line in another file by using seek().