Getting rid of duplicates

New Novice has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Getting rid of duplicates by davorg (Chancellor) on Sep 29, 2004 at 14:34 UTC
This is covered in the FAQ. How can I remove duplicate elements from a list or array? -- <http://www.dave.org.uk> "The first rule of Perl club is you do not talk about Perl club." -- Chip Salzenberg	[reply]
A brief plea regarding pointers to the FAQ by tmoertel (Chaplain) on Sep 29, 2004 at 15:52 UTC
The author's problem was to remove duplicate elements from an input file, not an array. While reading the elements from the input file into an array and then applying the FAQ-given solution is one way of solving the author's problem, there are other solutions available that are simpler and more efficient (see my follow-up post for one of them). I agree that the FAQ is the first place to look for answers and that we ought to point to it in responses to questions like this. But I also think that we ought to read questions carefully and answer the whole of the questions actually asked, even if the FAQ answers similar questions for us. While it might be easy for some to convert the FAQ's answers into the original questions' answers, that task might be beyond some readers. Updated: I removed the last half of this post because it was overly preachy. All I really wanted to say was, When we post pointers to the FAQ, let's also spend a few moments relating it back to the context of the original problem. I'm leaving this big Updated notice here as a humbling reminder to myself to stay off the soapbox. Cheers, Tom Tom Moertel : Blog / Talks / CPAN / LectroTest / PXSL / Coffee / Movie Rating Decoder	[reply]
Re: Getting rid of duplicates by rlb3 (Deacon) on Sep 29, 2004 at 14:12 UTC
Hello, You may want to use hashes. This is untested. `my %store; foreach (<DATA>) { chomp; $store{$_} = 1; } print join ",", keys %store; __DATA__ 1236 3232 1236 4323 4323` [download] Something like that may work for you. rlb3	[reply] [d/l]
Re^2: Getting rid of duplicates by JediWizard (Deacon) on Sep 29, 2004 at 14:28 UTC
I have a utility function I often use for this sort of thing. See Below: `my(@numbers) = <DATA>; chomp @number; print join("\n", uniqStrings(@numbers)); sub uniqStrings { my %temp = (); @temp{@_} = (); return keys %temp; } __DATA__ 1236 3232 1236 4323 4323` [download] That will actually run a little faster because the method used to remove duplicate numbers is pretty well optimized. Hope that helps. May the Force be with you	[reply] [d/l]
Re: Getting rid of duplicates by tmoertel (Chaplain) on Sep 29, 2004 at 14:49 UTC
The following one-liner does what you want and has the advantages of handling the multiple representations available for numbers and preserving the order of the input lines. I expect that both are important to you because you didn't just use `sort -nu` to solve your problem: `perl -lne 'print unless $counts{0+$_}++' input.txt > output.txt` [download] We use the `-lne` command-line switches to cause Perl to read each line of input, strip off the line break, and then execute the following code on the result: `print unless $counts{0+$_}++` [download] The code prints the current line if the count of times we have seen it so far is zero. We use the hash `%counts` to keep track of the counts. Note the `0+` inside of the hash index. It ensures that the input lines are interpreted as numbers so that, for example, "1" and "1.0" are considered to be the same for the sake of duplicate removal. Cheers, Tom Tom Moertel : Blog / Talks / CPAN / LectroTest / PXSL / Coffee / Movie Rating Decoder	[reply] [d/l] [select]
Re: Getting rid of duplicates by periapt (Hermit) on Sep 29, 2004 at 14:19 UTC
If you are not married to a perl solution, use the unix utility sort -n -u < infile.txt if you need a perl solution, try perl -e'my %list = (); $list{$_} = 1 while <>; print sort keys %list; ' < infile.txt PJ use strict; use warnings; use diagnostics;	[reply]
Re: Getting rid of duplicates by Arunbear (Prior) on Sep 29, 2004 at 14:19 UTC
You can use a hash to 'remember' which numbers have already been seen: `use strict; use warnings; my %numbers; open my $in, "infile" or die $!; open my $out, ">outfile" or die $!; while (<$in>) { chomp; if (not exists $numbers{$_}) { print $out "$_\n"; $numbers{$_}++; } }` [download] This method preserves the original order of the numbers. Generally, testing for containment is possible for hashes via the exists function. For arrays you would need to use grep or the `first` function from List::Util	[reply] [d/l] [select]
Re: Getting rid of duplicates by jZed (Prior) on Sep 29, 2004 at 14:24 UTC
`my(@array,%hash); for (1,2,3,2){push @array,$_ unless $hash{$_}++}; print @array; # prints 123 (no duplicates)` [download]	[reply] [d/l]
Re: Getting rid of duplicates by terra incognita (Pilgrim) on Sep 29, 2004 at 18:36 UTC
Another one using a hash, this is a modified character frequency example from perlretut. This will sort and also handle negative numbers. Comments on where I can improve this code and what practices I should stay away from are appreciated. `use strict; local $/; my $f = <DATA>; my %chars; $f =~ s/(.+)/$chars{$1}++;$1/eg; # final $1 replaces char with itself print "'$_'\n" foreach (sort {$a <=> $b} keys %chars); __DATA__ 1 1 2 2 3 3 4 5 6 7 8 9 10 11 12 13 -12 -3` [download]	[reply] [d/l]
Re: Getting rid of duplicates by johndageek (Hermit) on Sep 29, 2004 at 16:40 UTC
In my simple mind, if the input file is sorted, I would do the following: `open in,"file" or die "can not open input file\n"; while (<in>){ print if ($_ ne $prev_record); $prev_record = $_; }` [download] Please read disclaimers by all monks that contain the word not. Enjoy! Dageek	[reply] [d/l]

use strict; use warnings; use diagnostics;