in reply to Removing duplicate lines from files (was 'Files')

We'll need more information to (ahem!) address this properly. What size are the files you need to manipulate and what is the format of the files? It you have small files, much of the important data can simply be read into memory. If not, you'll have to explore alternatives.

For the sake of argument, we'll assume each line of the file(s) has one email address and nothing else. Use a hash to track the addresses (untested code follows):

#!/usr/bin/perl -w use strict; my $in_file = 'email.log'; my $out_file = 'new_email.log'; # Note that this is written to a *diff +erent* file # so we can go back if we screw up # If that's not good, backup the email + log. open IN, "< $in_file" or die "Can't open $in_file for reading: $!"; open OUT, "> $out_file" or die "Can't open $out_file for writing: $!"; my %address; while (<IN>) { print OUT, $_ if ! $address{ $_ }++; } close OUT or die "Can't close $out_file: $!" close IN or die "Can't close $in_file: $!";

You also might want to check out the Perl Cookbook. There's not a line of original code above. All of it was shamelessly stolen from many hours of enjoying this tome.

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.