Re^3: removing non-duplicates

It really depends on what you want to do in the end. Do you want to create a new file with all the duplicate entries removed? Do you want to keep one instance of each unique entry? Or do you simply need a report about the entries?

From what you told us, I'd say there's no need for reading up the entire file in memory, something like the following should do.

#!/usr/bin/perl
use strict;
use warnings;

my %seen;
my $file = "/path/to/your/file";

open(MYFILE, "<", $file) or die "Could not open '$file' for reading: $
+!";
while (<MYFILE>) {
    $seen{$_}++;
}
close MYFILE;

# Now every unique entry has a value of 1 in the hash %seen
print "Unique entries:\n";
print "$_\n" for (grep { $seen{$_} == 1 } keys %seen);
[download]

Comment on Re^3: removing non-duplicates Download Code

Replies are listed 'Best First'.
Re^4: removing non-duplicates by radiantmatrix (Parson) on Jul 11, 2005 at 20:30 UTC
In the spirit of TMTOWTDI: `$ perl -ne '$seen{$_}++; $seen{$_} == 1 and print' old.txt > new.txt` [download] update: the node title "removing non-duplicates" suggests that lines should be printed only when we know it's a duplicate. To do that: `$ perl -ne '$seen{$_}++; $seen{$_} > 1 and print' old.txt > new.txt` [download] In either case, the result ends up in `new.txt` Larry Wall is Yoda: there is no `try{}` The Code that can be seen is not the true Code	[reply] [d/l] [select]
Re^4: removing non-duplicates by Anonymous Monk on Jul 11, 2005 at 19:58 UTC
thank u and jd below, both seem to work the way i needed. Thanks for ur time guys	[reply]