Re^2: removing non-duplicates

Replies are listed 'Best First'.
Re^3: removing non-duplicates by Fang (Pilgrim) on Jul 11, 2005 at 19:43 UTC
It really depends on what you want to do in the end. Do you want to create a new file with all the duplicate entries removed? Do you want to keep one instance of each unique entry? Or do you simply need a report about the entries? From what you told us, I'd say there's no need for reading up the entire file in memory, something like the following should do. `#!/usr/bin/perl use strict; use warnings; my %seen; my $file = "/path/to/your/file"; open(MYFILE, "<", $file) or die "Could not open '$file' for reading: $ +!"; while (<MYFILE>) { $seen{$_}++; } close MYFILE; # Now every unique entry has a value of 1 in the hash %seen print "Unique entries:\n"; print "$_\n" for (grep { $seen{$_} == 1 } keys %seen);` [download]	[reply] [d/l]
Re^4: removing non-duplicates by radiantmatrix (Parson) on Jul 11, 2005 at 20:30 UTC
In the spirit of TMTOWTDI: `$ perl -ne '$seen{$_}++; $seen{$_} == 1 and print' old.txt > new.txt` [download] update: the node title "removing non-duplicates" suggests that lines should be printed only when we know it's a duplicate. To do that: `$ perl -ne '$seen{$_}++; $seen{$_} > 1 and print' old.txt > new.txt` [download] In either case, the result ends up in `new.txt` Larry Wall is Yoda: there is no `try{}` The Code that can be seen is not the true Code	[reply] [d/l] [select]
Re^4: removing non-duplicates by Anonymous Monk on Jul 11, 2005 at 19:58 UTC
thank u and jd below, both seem to work the way i needed. Thanks for ur time guys	[reply]