in reply to Re^2: removing non-duplicates
in thread removing non-duplicates

It really depends on what you want to do in the end. Do you want to create a new file with all the duplicate entries removed? Do you want to keep one instance of each unique entry? Or do you simply need a report about the entries?

From what you told us, I'd say there's no need for reading up the entire file in memory, something like the following should do.

#!/usr/bin/perl use strict; use warnings; my %seen; my $file = "/path/to/your/file"; open(MYFILE, "<", $file) or die "Could not open '$file' for reading: $ +!"; while (<MYFILE>) { $seen{$_}++; } close MYFILE; # Now every unique entry has a value of 1 in the hash %seen print "Unique entries:\n"; print "$_\n" for (grep { $seen{$_} == 1 } keys %seen);

Replies are listed 'Best First'.
Re^4: removing non-duplicates
by radiantmatrix (Parson) on Jul 11, 2005 at 20:30 UTC

    In the spirit of TMTOWTDI:

    $ perl -ne '$seen{$_}++; $seen{$_} == 1 and print' old.txt > new.txt

    update: the node title "removing non-duplicates" suggests that lines should be printed only when we know it's a duplicate. To do that:

    $ perl -ne '$seen{$_}++; $seen{$_} > 1 and print' old.txt > new.txt
    In either case, the result ends up in new.txt
    Larry Wall is Yoda: there is no try{}
    The Code that can be seen is not the true Code
Re^4: removing non-duplicates
by Anonymous Monk on Jul 11, 2005 at 19:58 UTC
    thank u and jd below, both seem to work the way i needed. Thanks for ur time guys