Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Extracting unique elements from array

by koolgirl (Hermit)
on Sep 28, 2011 at 19:22 UTC ( #928398=perlquestion: print w/replies, xml ) Need Help??

koolgirl has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks....again...

OK, so I've got an @ with several hundred elements, some of them "batch", some initials, some numbers, all different lengths. I need to extract only the unique elements, and discard double elements.

I discussed a bit ago in CB, was told to use cmp, I'm not understanding how to get the elements of the array, to iterate through $a and $b in a function, all several hundred of them.

So I went to the cookbook, to find a way to count them with a hash. This is what I found:

#!usr/bin/perl use strict; #use warnings; my %seen = ( ); my @uniq = ( ); my @list = "bob, bob, sue, sue"; my $item; my $element; foreach $item (@list) { if ($seen{$item}) { # same, don't grab } else{ # if we get here, we have not seen it before $seen{$item} = 1; push(@uniq, $item); } } foreach $element(@uniq) { print $element . "\n"; }

But it doesn't work. And I haven't even tried adding numbers and such in with the test array yet. Can someone please break this down for me like I'm a two year old...I have never understood the cmp routines in examples, because I don't get how to pass all the hundreds of elements in my @ into to $a and $b to be compared. I understand how it compares after that, just not that first part. Please help me, I'm starting to see green Matrix code running down my eyes....

Thanks koolgirl

"The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.." -- George Bernard Shaw

Replies are listed 'Best First'.
Re: Extracting unique elements from array
by toolic (Bishop) on Sep 28, 2011 at 19:26 UTC
    my @list = "bob, bob, sue, sue";
    The @list array variable only has one element; perhaps you think it has 4 elements.
    use warnings; use strict; #use warnings; my %seen = ( ); my @uniq = ( ); my @list = "bob, bob, sue, sue"; my $item; my $element; print scalar(@list), "\n"; __END__ 1
    my @list = qw(bob bob sue sue);
    Data::Dumper is handy too.

      AH, damn it I have forgotten everything, I swear. That's what I get for adding to the cookbook's code. Thank you. I would still like to maybe understand what's happening w/$a and $b in cmp routines, but this should work for now anyway. Thanks again.

      "The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.." -- George Bernard Shaw

        I'm not quite sure I understand where your confusion lies with cmp... cmp is just an operator that returns -1 if its left operand is alphabetically before its right operand, 0 if the operands are equal, and 1 if the left operand is alphabetically after the right operand.

        When you use the sort function of perl, you can tell it how you want it to behave. You don't need to pass the elements of your list to $a and $b, sort will do it for you.

        @abclist = qw(a b c d e); @numlist = qw(1 3 5 7 9); print sort {$a cmp $b} @abclist; // Default sort, prints 'a b c d e' print sort {$b cmp $a} @abclist; // Prints 'e d c b a' print sort {$b <=> $a} @numlist; // Prints '9 7 5 3 1' print sort {$a == 7 ? -1 : ($b == 7 ? 1 : $a <=> $b)} @numlist;

        That last one is the most interesting. It ensures that 7 will always be the first element in the list... if $a is 7, no matter that $b is, we say that $a is less than $b; if $b is 7, no matter what $a is, we say that $a is greater than $b; otherwise, we do a normal <=> comparison.

        Sort allows you to provide a block of code that will evaluate to a negative number, 0, or a positive number to create custom sort behavior.

        Update: I am replying to this part: I would still like to maybe understand what's happening w/$a and $b in cmp routines. I interpreted this to mean: how does comparison and $a and $b work? This is not directly on the point of the original question, but appears to be on point for the follow-up statement.

        I'll try to help you with $a, $b and cmp. I've tried to explain this a number of times - sometimes with more success than others! I have not read by anybody or written myself the "perfect" faq on this - so this is just yet another attempt! On a subject like this, I think that hearing it from different people in different words helps. For that reason, I'm not sure that it is even possible to write the "perfect" faq on this!

        The reason for sorting is of course to order something. Order what? In order to "sort" something, we have to start with "something" where order has meaning. That sounds basic, but sometimes this is a stumbling block!

        That "something" is a list of stuff or an array of stuff. This cannot be a hash of stuff because hashes have no order (we can sort the keys to a hash because we can make an ordered list of them), but to "sort the hash" itself has no meaning, because a hash has no sequential order.

        The basic syntax of Perl sort is:
        @output = sort{...code block...}@input;

        I'm going to back up and generalize a bit now... The following applies not only to Perl, but to C, C++, Java, etc. Current computers (not quantum computers) only deal with the comparison of 2 things at once. Fancy sorting algorithms deal with how to minimize the total number of comparisons that are required between 2 individual things to get the input data list into the desired order. The language or sorting library will provide the "fancy sorting algorithm" that minimizes the number of comparisons. What you have to provide is a way for that sorting library or language function to know whether two things are: "a<b, exactly the same: a==b", or b>a".

        Every language has its own way of implementing this requirement that the user program provide a way of comparing two things. Perl has two magic global variables, $a and $b. sort{...code block...}@input causes pairs of things from @input to be set to $a and $b and then the code within the sort block to be run. In Perl cmp causes an alphabetic comparison to be used. In Perl <=>, the "spaceship" operator causes numeric values to be used for the comparison.

        Re: Sorting help illustrates a lot of these points.
        Re: Custom sort with string of numbers gives more practical "how-to"

        I hope the code below will lead you into a "Oh, my gosh!" realization! The code within the sort{...} block is essentially a subroutine and you can put other stuff in there like "print"! Below, I show which 2 items the version specific sort compares.

        #!/usr/bin/perl -w use strict; my @array1 = ("jerry", "abe", "hope", "crazy_horse", "lewis","bob"); print "initial array order: @array1\n"; my @array = sort @array1; print "The default sort: @array\n"; @array = sort{$a cmp $b}@array1; print "The same sort order: @array\n\n"; @array = sort{ print "comparing $a and $b \n"; $a cmp $b; } @array1; print "\n"; print "The same thing: @array\n\n"; __END__ initial array order: jerry abe hope crazy_horse lewis bob The default sort: abe bob crazy_horse hope jerry lewis The same sort order: abe bob crazy_horse hope jerry lewis comparing jerry and abe comparing hope and crazy_horse comparing lewis and bob comparing abe and crazy_horse comparing crazy_horse and jerry comparing jerry and hope comparing abe and bob comparing bob and crazy_horse comparing crazy_horse and lewis comparing lewis and hope comparing lewis and jerry The same thing: abe bob crazy_horse hope jerry lewis
        Update: One of the main points that I was trying to get across was that when you see a tricky, map{} grep{} sort{} and wonder what it does and how it works?, it is completely fine to put some print statements in there! I do this when I am debugging something complex and my brain can't comprehend why "what is happening" is happening! These code blocks that map, grep, sort use are like subroutines.

        Please note that the last statement in a block like what I'm talking about cannot be 'print "$result\n";' because "print" like all I/O routines returns a status, in this case '1', so you need something like: print $result; $result; or some other way so that the last line is the return value.

Re: Extracting unique elements from array
by aaron_baugher (Curate) on Sep 28, 2011 at 21:13 UTC

    Since you didn't say you needed to sort anything, I'm not sure what use you'd have for cmp and the special $a and $b variables. Finding unique values is straightforward enough: assign them to a hash as its keys, and then extract the keys. Since hash keys are guaranteed to be unique, any assigned multiple times will still only exist once.

    my @list = qw(a b c d e a a b c d f g); # list with some duplicates my %h; map { $h{$_} = 1 } @list; print for sort keys %h; # sorting for clarity's sake #output abcdefg
Re: Extracting unique elements from array
by umasuresh (Hermit) on Sep 29, 2011 at 17:36 UTC
    I did get this from perlmonks, but cant find the reference id:
    I am not the author of this snippet, but I use it always:
    my %saw; my @array = (1,1,1,2,2,3,3,3,3,4,5,5,5,6,6,7,6,5); @saw{@array} =(); # initialize hash slice my @uniq_elements = sort keys %saw;
      Here's a one-liner (if the original order isn't important) - no temp hash var required:
      @foo = (1, 1, 1, 2, 3, 3, 3, 4, 5, 5, 6, 7, 8, 8, 9, 10); my @unique = sort {$a <=> $b} keys({map {$_ => 1} @foo}); print join(",", @unique) . "\n";
      Two or more arrays? Not a problem.
      @foo = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10); @bar = (0, 1, 2, 5, 7, 9, 11, 12, 13, 14); my @unique = sort {$a <=> $b} keys({map {$_ => 1} (@foo, @bar)}); print join(",", @unique) . "\n";

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://928398]
Approved by toolic
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2022-08-14 09:23 GMT
Find Nodes?
    Voting Booth?

    No recent polls found