in reply to
Removing Duplicate Files
Dupseek
is a pretty good Perl implementation of what you're after, which has been around for a while now. I've never tested in on a data set of this size, though.
Tim
Comment on
Re: Removing Duplicate Files
In Section
Seekers of Perl Wisdom