in reply to Re: Removing duplicates in large files (a hash, or divide-and-conquer)
in thread Removing duplicates in large files

If you're doing that, you may as well do it in one pass:
while (<>) { print unless $seen{$_}++; }
You could also shrink the memory usage by computing your own hash value and using that as the %seen key -- but I don't think I'm going to get into any more details unless the original poster swears that this has nothing to do with harvesting addresses for spammers.
  • Comment on Re: Re: Removing duplicates in large files (a hash, or divide-and-conquer)
  • Download Code