Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I basically have an array with duplicate values and want to find the first unique element of each type and then print out the corresponding values from two other arrays (where e.g. $name1[0] corresponds to $name2[0] and $percent[0] etc.

My code doesn't work and i'm not sure why not. Please can someone point out my mistake?

my @uniq; my %seen = (); for (my $i=0; $i<@name1; $i++) { push (@uniq, $name1[$i]) unless $seen{$name1[$i]}++; # THIS BIT BELOW DOESN'T WORK, I'M TRYING TO SAY IF # NAME1[$] IS SEEN FOR THE FIRST TIME PRINT THE CORRE +SPONDING VALUES IN $NAME2 AND $PERCENT print "$name1[$i]\t$name2[$i]\t$percent[$i]\n" unless $seen{$ +name1[$i]}++; }

Replies are listed 'Best First'.
Re: finding the first unique element in an array
by Corion (Patriarch) on Jul 19, 2005 at 08:43 UTC

    Instead of remembering the elements in @uniq, directly print them out:

    my @uniq; my %seen = (); for my $i (0..$#name1) { if (!$seen{$name1[$i]}) { push (@uniq, $name1[$i]); print "$name1[$i]\t$name2[$i]\t$percent[$i]\n"; }; }

    As an aside, I've removed your C-style loop, which is error-prone, and changed it into a more perlish loop.

    Your data structures also scream for a hash, as using parallel arrays also is error prone; it's not always convenient to rearrange your data into concentrated data structures, especially if the arrays get filled from separate data streams.

      As an aside, I've removed your C-style loop, which is error-prone
      Error-prone is in the eye of the beholder, and the C-style loop avoids the ugly $#array syntax.
      Your data structures also scream for a hash, as using parallel arrays also is error prone
      That would be a array of hashes then. But that has a severe drawback: it uses a lot of memory. Aggregates have a lot of overhead in Perl. The three parallel arrays only use three aggregates - regardless of the number of entities stored. Putting each entity in its own hash gives you an aggregate for each entity (that is, for each element of the original @name1). And that adds up. Fast.

      Below is a program that shows the difference in memory usage . The hashes solution uses almost two and a half times the amount of memory.

      #!/usr/bin/perl use strict; use warnings; use Devel::Size qw 'total_size'; my $size = 100_000; my (@name1, @name2, @percent); my @big; sub r_name { # Make a random name. my $r = ""; $r .= ('a' .. 'z')[rand 26] for 1 .. 3 + rand(5); return $r; } for (1 .. $size) { my $name1 = r_name; my $name2 = r_name; my $perc = rand(100); push @name1, $name1; push @name2, $name2; push @percent, $perc; push @big, {name1 => $name1, name2 => $name2, percent => $perc}; } my $size_3 = total_size(\@name1) + total_size(\@name2) + total_size(\@ +percent); my $size_1 = total_size(\@big); printf "Three arrays: %10d (%6.2f)\n", $size_3, $size_3 / $size; printf "One structure: %10d (%6.2f)\n", $size_1, $size_1 / $size; __END__ Three arrays: 9573279 ( 95.73) One structure: 23724662 (237.25)
      Corion,
      There is at least 1 problem with this code (and possibly 2).
      if (!$seen{$name1[$i]}) { should be if ( ! $seen{$name1[$i]}++ ) {
      Without the post-increment, everything would be pushed to the unique array. The second possible problem is one of interpretation. I read the original post to mean that there would be some unique entries in a list containing duplicates and the object was to find the first one. If that interpretation is correct - then you have to wait until after going through the first array in its entirety before you can know if an item is unique or not.

      Cheers - L~R

Re: finding the first unique element in an array
by Zaxo (Archbishop) on Jul 19, 2005 at 08:49 UTC

    You're incrementing an element of %seen twice with each pass. That will prevent printing, but @uniq should contain what you expect.

    Try this instead:

    my @uniq; my %seen; for (0 .. $#name1) { next if $seen{$name1[$_]}++; push @uniq, $name1[$_]; print "$name1[$_]\t$name2[$_]\t$percent[$_]\n"; }

    After Compline,
    Zaxo

Re: finding the first unique element in an array
by gopalr (Priest) on Jul 19, 2005 at 09:48 UTC
Re: finding the first unique element in an array
by blazar (Canon) on Jul 19, 2005 at 09:01 UTC
    This is a FAQ. See perldoc -q duplicate. The usual solution is
    my %seen; my @uniq=grep !$seen{$_}++, @name1;
      blazar,
      If I am understanding the question correctly, this is not a FAQ and your solution has nothing to do with the question being asked.

      The object isn't to remove duplicates but to find the first entry that doesn't have a duplicate and then print the corresponding element (via the index) in another array.

      my (@name1, @name2, @percent); # initialized elsewhere my %seen; ++$seen{$_} for @name1; for ( 0 .. $#name1 ) { next if $seen{ $name[$_] } > 1; print $name2[$_]; last; }
      It is completely possible that I am the one who has interpreted the problem wrong though.

      Cheers - L~R

      Update: Oversight corrected thanks to blazar below. That's what you get for not testing your code ;-)

        I think we're both (partly) wrong. Actually the subject seems to support your interpretation. OTOH I'm convinced that part of the code supports mine. Actually I didn't notice the idx thing and on a better reading I think that what he wants may be along the lines of
        my %seen; for (0..$#name1) { next if $seen{ $name1[$_] }++; print $name2[$_]; }
        But unless I'm mistaking something obvious your code won't work as you're populating %seen with
        ( 0 => 1, ..., $#name1 => 1 )
        .
Re: finding the first unique element in an array
by Anonymous Monk on Jul 19, 2005 at 09:40 UTC
    You push on @uniq, but you don't do anything with it. Why not just:
    my %seen; for (my $i = 0; $i < @name1; $i ++) { next if $seen{$name1[$i]}++; print "$name1[$i]\t$name2[$i]\t$percent[$i]\n"; }