Re: Getting rid of duplicates
by davorg (Chancellor) on Sep 29, 2004 at 14:34 UTC
|
| [reply] |
|
|
The author's problem was to remove duplicate elements from an input file,
not an array. While reading the elements from the input file
into an array and then applying the FAQ-given solution is one way of
solving the author's problem, there are other solutions available that
are simpler and more efficient (see my follow-up post for one of
them).
I agree that the FAQ is the first place
to look for answers and that we ought to point to it in responses to
questions like this. But I also think that we ought to read questions
carefully and answer the whole of the questions actually asked, even if the FAQ
answers similar questions for us. While it might be easy for some to
convert the FAQ's answers into the original questions' answers, that
task might be beyond some readers.
Updated: I removed the last half of this post because it was overly preachy. All I really wanted to say was, When we post pointers to the FAQ, let's also spend a few moments relating it back to the context of the original problem. I'm leaving this big Updated notice here as a humbling reminder to myself to stay off the soapbox.
Cheers, Tom
| [reply] |
Re: Getting rid of duplicates
by rlb3 (Deacon) on Sep 29, 2004 at 14:12 UTC
|
Hello,
You may want to use hashes.
This is untested.
my %store;
foreach (<DATA>) {
chomp;
$store{$_} = 1;
}
print join ",", keys %store;
__DATA__
1236
3232
1236
4323
4323
Something like that may work for you.
rlb3 | [reply] [d/l] |
|
|
my(@numbers) = <DATA>;
chomp @number;
print join("\n", uniqStrings(@numbers));
sub uniqStrings
{
my %temp = ();
@temp{@_} = ();
return keys %temp;
}
__DATA__
1236
3232
1236
4323
4323
That will actually run a little faster because the method used to remove duplicate numbers is pretty well optimized. Hope that helps.
May the Force be with you
| [reply] [d/l] |
Re: Getting rid of duplicates
by tmoertel (Chaplain) on Sep 29, 2004 at 14:49 UTC
|
The following one-liner does what you want and has the advantages of
handling the multiple representations available for numbers and preserving the order of the input lines. I expect that both are important to you because you didn't just use sort -nu to solve
your problem:
perl -lne 'print unless $counts{0+$_}++' input.txt > output.txt
We use the -lne command-line switches to cause Perl to
read each line of input, strip off the line break, and then execute
the following code on the result:
print unless $counts{0+$_}++
The code prints the current line if the count of times we have seen
it so far is zero. We use the hash %counts to keep track
of the counts.
Note the 0+ inside of the hash index. It ensures that
the input lines are interpreted as numbers so that, for example, "1"
and "1.0" are considered to be the same for the sake of duplicate
removal.
Cheers, Tom
| [reply] [d/l] [select] |
Re: Getting rid of duplicates
by periapt (Hermit) on Sep 29, 2004 at 14:19 UTC
|
If you are not married to a perl solution, use the unix utility sort -n -u < infile.txt
if you need a perl solution, try perl -e'my %list = (); $list{$_} = 1 while <>; print sort keys %list; ' < infile.txt
PJ
use strict; use warnings; use diagnostics;
| [reply] |
Re: Getting rid of duplicates
by Arunbear (Prior) on Sep 29, 2004 at 14:19 UTC
|
You can use a hash to 'remember' which numbers have already been seen:
use strict;
use warnings;
my %numbers;
open my $in, "infile" or die $!;
open my $out, ">outfile" or die $!;
while (<$in>) {
chomp;
if (not exists $numbers{$_}) {
print $out "$_\n";
$numbers{$_}++;
}
}
This method preserves the original order of the numbers. Generally, testing for containment is possible for hashes via the exists function. For arrays you would need to use grep or the first function from List::Util | [reply] [d/l] [select] |
Re: Getting rid of duplicates
by jZed (Prior) on Sep 29, 2004 at 14:24 UTC
|
my(@array,%hash);
for (1,2,3,2){push @array,$_ unless $hash{$_}++};
print @array; # prints 123 (no duplicates)
| [reply] [d/l] |
Re: Getting rid of duplicates
by terra incognita (Pilgrim) on Sep 29, 2004 at 18:36 UTC
|
Another one using a hash, this is a modified character frequency example from perlretut. This will sort and also handle negative numbers.
Comments on where I can improve this code and what practices I should stay away from are appreciated.
use strict;
local $/;
my $f = <DATA>;
my %chars;
$f =~ s/(.+)/$chars{$1}++;$1/eg; # final $1 replaces char with itself
print "'$_'\n" foreach (sort {$a <=> $b} keys %chars);
__DATA__
1
1
2
2
3
3
4
5
6
7
8
9
10
11
12
13
-12
-3
| [reply] [d/l] |
Re: Getting rid of duplicates
by johndageek (Hermit) on Sep 29, 2004 at 16:40 UTC
|
In my simple mind, if the input file is sorted, I would do the following:
open in,"file" or die "can not open input file\n";
while (<in>){
print if ($_ ne $prev_record);
$prev_record = $_;
}
Please read disclaimers by all monks that contain the word not.
| [reply] [d/l] |