koolgirl has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks....again...
OK, so I've got an @ with several hundred elements, some of them "batch", some initials, some numbers, all different lengths. I need to extract only the unique elements, and discard double elements.
I discussed a bit ago in CB, was told to use cmp, I'm not understanding how to get the elements of the array, to iterate through $a and $b in a function, all several hundred of them.
So I went to the cookbook, to find a way to count them with a hash. This is what I found:
#!usr/bin/perl
use strict;
#use warnings;
my %seen = ( );
my @uniq = ( );
my @list = "bob, bob, sue, sue";
my $item;
my $element;
foreach $item (@list) {
if ($seen{$item}) {
# same, don't grab
} else{
# if we get here, we have not seen it before
$seen{$item} = 1;
push(@uniq, $item);
}
}
foreach $element(@uniq) {
print $element . "\n";
}
But it doesn't work. And I haven't even tried adding numbers and such in with the test array yet. Can someone please break this down for me like I'm a two year old...I have never understood the cmp routines in examples, because I don't get how to pass all the hundreds of elements in my @ into to $a and $b to be compared. I understand how it compares after that, just not that first part. Please help me, I'm starting to see green Matrix code running down my eyes....
Thanks
koolgirl
"The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.." -- George Bernard Shaw
Re: Extracting unique elements from array
by toolic (Bishop) on Sep 28, 2011 at 19:26 UTC
|
my @list = "bob, bob, sue, sue";
The @list array variable only has one element; perhaps you think it has 4 elements.
use warnings;
use strict;
#use warnings;
my %seen = ( );
my @uniq = ( );
my @list = "bob, bob, sue, sue";
my $item;
my $element;
print scalar(@list), "\n";
__END__
1
Try:
my @list = qw(bob bob sue sue);
Data::Dumper is handy too. | [reply] [d/l] [select] |
|
use List::MoreUtils qw/uniq/;
my @uniq = uniq @list;
| [reply] [d/l] [select] |
|
AH, damn it I have forgotten everything, I swear. That's what I get for adding to the cookbook's code. Thank you. I would still like to maybe understand what's happening w/$a and $b in cmp routines, but this should work for now anyway. Thanks again.
"The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.." -- George Bernard Shaw
| [reply] |
|
I'm not quite sure I understand where your confusion lies with cmp... cmp is just an operator that returns -1 if its left operand is alphabetically before its right operand, 0 if the operands are equal, and 1 if the left operand is alphabetically after the right operand.
When you use the sort function of perl, you can tell it how you want it to behave. You don't need to pass the elements of your list to $a and $b, sort will do it for you.
@abclist = qw(a b c d e);
@numlist = qw(1 3 5 7 9);
print sort {$a cmp $b} @abclist; // Default sort, prints 'a b c d e'
print sort {$b cmp $a} @abclist; // Prints 'e d c b a'
print sort {$b <=> $a} @numlist; // Prints '9 7 5 3 1'
print sort {$a == 7 ? -1 : ($b == 7 ? 1 : $a <=> $b)} @numlist;
That last one is the most interesting. It ensures that 7 will always be the first element in the list... if $a is 7, no matter that $b is, we say that $a is less than $b; if $b is 7, no matter what $a is, we say that $a is greater than $b; otherwise, we do a normal <=> comparison.
Sort allows you to provide a block of code that will evaluate to a negative number, 0, or a positive number to create custom sort behavior. | [reply] [d/l] |
|
Update: I am replying to this part: I would still like to maybe understand what's happening w/$a and $b in cmp routines. I interpreted this to mean: how does comparison and $a and $b work? This is not directly on the point of the original question, but appears to be on point for the follow-up statement.
I'll try to help you with $a, $b and cmp. I've tried to explain this a
number of times - sometimes with more success than others! I have not
read by anybody or written myself the "perfect" faq on this - so this is
just yet another attempt! On a subject like this, I think that hearing
it from different people in different words helps. For that reason,
I'm not sure that it is even possible to write the "perfect" faq on this!
The reason for sorting is of course to order something. Order what? In order
to "sort" something, we have to start with "something" where order has meaning.
That sounds basic, but sometimes this is a stumbling block!
That "something" is a list of stuff or an array of stuff. This cannot be a
hash of stuff because hashes have no order (we can sort the keys to a hash
because we can make an ordered list of them), but to "sort the hash" itself has
no meaning, because a hash has no sequential order.
The basic syntax of Perl sort is:
@output = sort{...code block...}@input;
I'm going to back up and generalize a bit now... The following applies not
only to Perl, but to C, C++, Java, etc. Current computers (not quantum computers)
only deal with the comparison of 2 things at once. Fancy sorting algorithms deal with
how to minimize the total number of comparisons that are required
between 2 individual things to get the input data list into the desired order.
The language or sorting library will provide the "fancy sorting algorithm"
that minimizes the number of comparisons. What you have to provide is
a way for that sorting library or language
function to know
whether two things are: "a<b, exactly the same: a==b", or b>a".
Every language has its own way of implementing this requirement that
the user program provide a way of comparing two things. Perl has two
magic global variables, $a and $b. sort{...code block...}@input
causes pairs of things from @input to be set to $a and $b and then the code
within the sort block to be run. In Perl cmp causes an alphabetic comparison to be used. In Perl <=>, the "spaceship" operator causes numeric values to be used for the comparison.
Re: Sorting help illustrates a lot of these points.
Re: Custom sort with string of numbers gives more practical "how-to"
I hope the code below will lead you into a "Oh, my gosh!" realization! The code within the sort{...} block is essentially a subroutine and you can put other stuff in there like "print"! Below, I show which 2 items the version specific sort compares.
#!/usr/bin/perl -w
use strict;
my @array1 = ("jerry", "abe", "hope", "crazy_horse", "lewis","bob");
print "initial array order: @array1\n";
my @array = sort @array1;
print "The default sort: @array\n";
@array = sort{$a cmp $b}@array1;
print "The same sort order: @array\n\n";
@array = sort{
print "comparing $a and $b \n";
$a cmp $b;
} @array1;
print "\n";
print "The same thing: @array\n\n";
__END__
initial array order: jerry abe hope crazy_horse lewis bob
The default sort: abe bob crazy_horse hope jerry lewis
The same sort order: abe bob crazy_horse hope jerry lewis
comparing jerry and abe
comparing hope and crazy_horse
comparing lewis and bob
comparing abe and crazy_horse
comparing crazy_horse and jerry
comparing jerry and hope
comparing abe and bob
comparing bob and crazy_horse
comparing crazy_horse and lewis
comparing lewis and hope
comparing lewis and jerry
The same thing: abe bob crazy_horse hope jerry lewis
Update: One of the main points that I was trying to get across was that when you see a tricky, map{} grep{} sort{} and wonder what it does and how it works?, it is completely fine to put some print statements in there! I do this when I am debugging something complex and my brain can't comprehend why "what is happening" is happening! These code blocks that map, grep, sort use are like subroutines. Please note that the last statement in a block like what I'm talking about cannot be 'print "$result\n";' because "print" like all I/O routines returns a status, in this case '1', so you need something like: print $result; $result; or some other way so that the last line is the return value. | [reply] [d/l] [select] |
|
AH, damn it I have forgotten everything, I swear.
Don't you have a ~/learnperl/arrays/unique.pl file? You should make several of that kind, here is what I have
Yes, you could refer to Tutorials/perlintro/perlfaq... http://pleac.sourceforge.net/pleac_perl/arrays.html , but I find I remember these things better when I type the code myself and refer to my files :) | [reply] |
Re: Extracting unique elements from array
by aaron_baugher (Curate) on Sep 28, 2011 at 21:13 UTC
|
Since you didn't say you needed to sort anything, I'm not sure what use you'd have for cmp and the special $a and $b variables. Finding unique values is straightforward enough: assign them to a hash as its keys, and then extract the keys. Since hash keys are guaranteed to be unique, any assigned multiple times will still only exist once.
my @list = qw(a b c d e a a b c d f g); # list with some duplicates
my %h;
map { $h{$_} = 1 } @list;
print for sort keys %h; # sorting for clarity's sake
#output
abcdefg
| [reply] [d/l] [select] |
Re: Extracting unique elements from array
by umasuresh (Hermit) on Sep 29, 2011 at 17:36 UTC
|
I did get this from perlmonks, but cant find the reference id:
I am not the author of this snippet, but I use it always:
my %saw;
my @array = (1,1,1,2,2,3,3,3,3,4,5,5,5,6,6,7,6,5);
@saw{@array} =(); # initialize hash slice
my @uniq_elements = sort keys %saw;
| [reply] [d/l] |
|
Here's a one-liner (if the original order isn't important) - no temp hash var required:
@foo = (1, 1, 1, 2, 3, 3, 3, 4, 5, 5, 6, 7, 8, 8, 9, 10);
my @unique = sort {$a <=> $b} keys({map {$_ => 1} @foo});
print join(",", @unique) . "\n";
Two or more arrays? Not a problem.
@foo = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
@bar = (0, 1, 2, 5, 7, 9, 11, 12, 13, 14);
my @unique = sort {$a <=> $b} keys({map {$_ => 1} (@foo, @bar)});
print join(",", @unique) . "\n";
-Simon | [reply] [d/l] [select] |
|
|