finding the first unique element in an array

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: finding the first unique element in an array by Corion (Patriarch) on Jul 19, 2005 at 08:43 UTC
Instead of remembering the elements in `@uniq`, directly print them out: `my @uniq; my %seen = (); for my $i (0..$#name1) { if (!$seen{$name1[$i]}) { push (@uniq, $name1[$i]); print "$name1[$i]\t$name2[$i]\t$percent[$i]\n"; }; }` [download] As an aside, I've removed your C-style loop, which is error-prone, and changed it into a more perlish loop. Your data structures also scream for a hash, as using parallel arrays also is error prone; it's not always convenient to rearrange your data into concentrated data structures, especially if the arrays get filled from separate data streams.	[reply] [d/l]
Re^2: finding the first unique element in an array by Anonymous Monk on Jul 19, 2005 at 10:02 UTC
As an aside, I've removed your C-style loop, which is error-prone Error-prone is in the eye of the beholder, and the C-style loop avoids the ugly `$#array` syntax. Your data structures also scream for a hash, as using parallel arrays also is error prone That would be a array of hashes then. But that has a severe drawback: it uses a lot of memory. Aggregates have a lot of overhead in Perl. The three parallel arrays only use three aggregates - regardless of the number of entities stored. Putting each entity in its own hash gives you an aggregate for each entity (that is, for each element of the original @name1). And that adds up. Fast. Below is a program that shows the difference in memory usage . The hashes solution uses almost two and a half times the amount of memory. #!/usr/bin/perl use strict; use warnings; use Devel::Size qw 'total_size'; my $size = 100_000; my (@name1, @name2, @percent); my @big; sub r_name { # Make a random name. my $r = ""; $r .= ('a' .. 'z')[rand 26] for 1 .. 3 + rand(5); return $r; } for (1 .. $size) { my $name1 = r_name; my $name2 = r_name; my $perc = rand(100); push @name1, $name1; push @name2, $name2; push @percent, $perc; push @big, {name1 => $name1, name2 => $name2, percent => $perc}; } my $size_3 = total_size(\@name1) + total_size(\@name2) + total_size(\@ +percent); my $size_1 = total_size(\@big); printf "Three arrays: %10d (%6.2f)\n", $size_3, $size_3 / $size; printf "One structure: %10d (%6.2f)\n", $size_1, $size_1 / $size; __END__ Three arrays: 9573279 ( 95.73) One structure: 23724662 (237.25) [download]	[reply] [d/l]
Re^2: finding the first unique element in an array by Limbic~Region (Chancellor) on Jul 19, 2005 at 14:45 UTC
Corion, There is at least 1 problem with this code (and possibly 2). `if (!$seen{$name1[$i]}) { should be if ( ! $seen{$name1[$i]}++ ) {` [download] Without the post-increment, everything would be pushed to the unique array. The second possible problem is one of interpretation. I read the original post to mean that there would be some unique entries in a list containing duplicates and the object was to find the first one. If that interpretation is correct - then you have to wait until after going through the first array in its entirety before you can know if an item is unique or not. Cheers - L~R	[reply] [d/l]
Re: finding the first unique element in an array by Zaxo (Archbishop) on Jul 19, 2005 at 08:49 UTC
You're incrementing an element of %seen twice with each pass. That will prevent printing, but @uniq should contain what you expect. Try this instead: `my @uniq; my %seen; for (0 .. $#name1) { next if $seen{$name1[$_]}++; push @uniq, $name1[$_]; print "$name1[$_]\t$name2[$_]\t$percent[$_]\n"; }` [download] After Compline, Zaxo	[reply] [d/l]
Re: finding the first unique element in an array by gopalr (Priest) on Jul 19, 2005 at 09:48 UTC
Have a look at List::Util, List::MoreUtils	[reply]
Re: finding the first unique element in an array by blazar (Canon) on Jul 19, 2005 at 09:01 UTC
This is a FAQ. See `perldoc -q duplicate`. The usual solution is `my %seen; my @uniq=grep !$seen{$_}++, @name1;` [download]	[reply] [d/l] [select]
Re^2: finding the first unique element in an array by Limbic~Region (Chancellor) on Jul 19, 2005 at 14:27 UTC
blazar, If I am understanding the question correctly, this is not a FAQ and your solution has nothing to do with the question being asked. The object isn't to remove duplicates but to find the first entry that doesn't have a duplicate and then print the corresponding element (via the index) in another array. `my (@name1, @name2, @percent); # initialized elsewhere my %seen; ++$seen{$_} for @name1; for ( 0 .. $#name1 ) { next if $seen{ $name[$_] } > 1; print $name2[$_]; last; }` [download] It is completely possible that I am the one who has interpreted the problem wrong though. Cheers - L~R Update: Oversight corrected thanks to blazar below. That's what you get for not testing your code ;-)	[reply] [d/l]
Re^3: finding the first unique element in an array by blazar (Canon) on Jul 19, 2005 at 14:43 UTC
I think we're both (partly) wrong. Actually the subject seems to support your interpretation. OTOH I'm convinced that part of the code supports mine. Actually I didn't notice the idx thing and on a better reading I think that what he wants may be along the lines of `my %seen; for (0..$#name1) { next if $seen{ $name1[$_] }++; print $name2[$_]; }` [download] But unless I'm mistaking something obvious your code won't work as you're populating `%seen` with `( 0 => 1, ..., $#name1 => 1 )` [download] .	[reply] [d/l] [select]
Re: finding the first unique element in an array by Anonymous Monk on Jul 19, 2005 at 09:40 UTC
You push on @uniq, but you don't do anything with it. Why not just: `my %seen; for (my $i = 0; $i < @name1; $i ++) { next if $seen{$name1[$i]}++; print "$name1[$i]\t$name2[$i]\t$percent[$i]\n"; }` [download]	[reply] [d/l]