Instead of remembering the elements in @uniq, directly print them out:
my @uniq;
my %seen = ();
for my $i (0..$#name1) {
if (!$seen{$name1[$i]}) {
push (@uniq, $name1[$i]);
print "$name1[$i]\t$name2[$i]\t$percent[$i]\n";
};
}
As an aside, I've removed your C-style loop, which is error-prone, and changed it into a more perlish loop.
Your data structures also scream for a hash, as using parallel arrays also is error prone; it's not always convenient to rearrange your data into concentrated data structures, especially if the arrays get filled from separate data streams. | [reply] [d/l] |
#!/usr/bin/perl
use strict;
use warnings;
use Devel::Size qw 'total_size';
my $size = 100_000;
my (@name1, @name2, @percent);
my @big;
sub r_name { # Make a random name.
my $r = "";
$r .= ('a' .. 'z')[rand 26] for 1 .. 3 + rand(5);
return $r;
}
for (1 .. $size) {
my $name1 = r_name;
my $name2 = r_name;
my $perc = rand(100);
push @name1, $name1;
push @name2, $name2;
push @percent, $perc;
push @big, {name1 => $name1, name2 => $name2, percent => $perc};
}
my $size_3 = total_size(\@name1) + total_size(\@name2) + total_size(\@
+percent);
my $size_1 = total_size(\@big);
printf "Three arrays: %10d (%6.2f)\n", $size_3, $size_3 / $size;
printf "One structure: %10d (%6.2f)\n", $size_1, $size_1 / $size;
__END__
Three arrays: 9573279 ( 95.73)
One structure: 23724662 (237.25)
| [reply] [d/l] |
Corion,
There is at least 1 problem with this code (and possibly 2).
if (!$seen{$name1[$i]}) {
should be
if ( ! $seen{$name1[$i]}++ ) {
Without the post-increment, everything would be pushed to the unique array. The second possible problem is one of interpretation. I read the original post to mean that there would be some unique entries in a list containing duplicates and the object was to find the first one. If that interpretation is correct - then you have to wait until after going through the first array in its entirety before you can know if an item is unique or not.
| [reply] [d/l] |
You're incrementing an element of %seen twice with each pass. That will prevent printing, but @uniq should contain what you expect.
Try this instead:
my @uniq;
my %seen;
for (0 .. $#name1) {
next if $seen{$name1[$_]}++;
push @uniq, $name1[$_];
print "$name1[$_]\t$name2[$_]\t$percent[$_]\n";
}
| [reply] [d/l] |
This is a FAQ. See perldoc -q duplicate. The usual solution is
my %seen;
my @uniq=grep !$seen{$_}++, @name1;
| [reply] [d/l] [select] |
my (@name1, @name2, @percent); # initialized elsewhere
my %seen;
++$seen{$_} for @name1;
for ( 0 .. $#name1 ) {
next if $seen{ $name[$_] } > 1;
print $name2[$_];
last;
}
It is completely possible that I am the one who has interpreted the problem wrong though.
Update: Oversight corrected thanks to blazar below. That's what you get for not testing your code ;-)
| [reply] [d/l] |
I think we're both (partly) wrong. Actually the subject seems to support your interpretation. OTOH I'm convinced that part of the code supports mine. Actually I didn't notice the idx thing and on a better reading I think that what he wants may be along the lines of
my %seen;
for (0..$#name1) {
next if $seen{ $name1[$_] }++;
print $name2[$_];
}
But unless I'm mistaking something obvious your code won't work as you're populating %seen with
( 0 => 1, ..., $#name1 => 1 )
. | [reply] [d/l] [select] |
You push on @uniq, but you don't do anything with it. Why not just:
my %seen;
for (my $i = 0; $i < @name1; $i ++) {
next if $seen{$name1[$i]}++;
print "$name1[$i]\t$name2[$i]\t$percent[$i]\n";
}
| [reply] [d/l] |