Re: Re: Removing duplicates in large files (a hash, or divide-and-conquer)

If you're doing that, you may as well do it in one pass:

while (<>) {
  print unless $seen{$_}++;
}
[download]

You could also shrink the memory usage by computing your own hash value and using that as the %seen key -- but I don't think I'm going to get into any more details unless the original poster swears that this has nothing to do with harvesting addresses for spammers.

Comment on Re: Re: Removing duplicates in large files (a hash, or divide-and-conquer) Download Code