Re: Removing duplicates in large files

I suppose that if you must use Perl for this, you could use DB_File (or other *DB_File modules), and just keep chugging the email address to a file. Since this being a hash, it would weed out duplicates.

some code fragments...

  use DB_File;
  my %hash = tie(...., 'DB_File'....);
  my $fh   = something_to_open_the_file(...);

  while (my $addr = <$fh>) {
    chomp($addr);
    $hash{lc($addr)} = 1;
  }
[download]

Then you can open that db that DB_File created, and dump it to a file, whatever.

However, if you got that much data I would use SQL ;)

Comment on Re: Removing duplicates in large files Download Code