in reply to Re: removing non-duplicates
in thread removing non-duplicates

is it best to read the file into an array and for each element if it matches then ignore? or a hash, with key and value equal to parts of each line

Replies are listed 'Best First'.
Re^3: removing non-duplicates
by Fang (Pilgrim) on Jul 11, 2005 at 19:43 UTC

    It really depends on what you want to do in the end. Do you want to create a new file with all the duplicate entries removed? Do you want to keep one instance of each unique entry? Or do you simply need a report about the entries?

    From what you told us, I'd say there's no need for reading up the entire file in memory, something like the following should do.

    #!/usr/bin/perl use strict; use warnings; my %seen; my $file = "/path/to/your/file"; open(MYFILE, "<", $file) or die "Could not open '$file' for reading: $ +!"; while (<MYFILE>) { $seen{$_}++; } close MYFILE; # Now every unique entry has a value of 1 in the hash %seen print "Unique entries:\n"; print "$_\n" for (grep { $seen{$_} == 1 } keys %seen);

      In the spirit of TMTOWTDI:

      $ perl -ne '$seen{$_}++; $seen{$_} == 1 and print' old.txt > new.txt

      update: the node title "removing non-duplicates" suggests that lines should be printed only when we know it's a duplicate. To do that:

      $ perl -ne '$seen{$_}++; $seen{$_} > 1 and print' old.txt > new.txt
      In either case, the result ends up in new.txt
      Larry Wall is Yoda: there is no try{}
      The Code that can be seen is not the true Code
      thank u and jd below, both seem to work the way i needed. Thanks for ur time guys