in reply to Verifying data in large number of textfiles

Better would be to get an md5 digest of each file. Very easy with perl 5.8 and PerlIO::via::MD5,

use PerlIO::via::MD5; my %digested; for (glob '/path/to/*.csv') { open my $fh, '<:via(MD5)', $_ or warn $! and next; my $sum = <$fh>; exists $digested{$sum} and unlink($_), next; $digested{$sum} = $_; }
That takes care of deleting duplicates as you go. The file to survive among duplicates is the first one seen.

After Compline,
Zaxo