tom2112 has asked for the wisdom of the Perl Monks concerning the following question:

I am reading in two sets of data (name and social security number for two lists of people) and comparing them. One data set is new and one is old. I need to create two lists from comparing the two data sets. 1) Who is on the new list, but not on the old list and 2) Who is on the old list, but not on the new list.

Currently, I am loading them both into hashes and want to compare them. Unfortunately, I can't find any help on comparing (difference) of two hashes, only for arrays. Am I going about this the wrong way?

Any suggestions are greatly appreciated.

BTW, I'm a total Perl noob, so be gentle. ;)

Replies are listed 'Best First'.
Re: Need advice on hashes and methods
by Fletch (Bishop) on Aug 15, 2006 at 13:38 UTC

    perldoc -q 'difference of two arrays', which will lead you to perlfaq4.

Re: Need advice on hashes and methods
by ptum (Priest) on Aug 15, 2006 at 13:36 UTC

    One way (perhaps a little brute-force) is to step through one of the hashes, and deleting entries from both hashes that match. When the dust settles, you'll have all the people in each hash that are not in the other hash. Try something like this (untested) snippet:

    foreach (keys %firsthash) { if (exists($secondhash{$_})) { delete($firsthash{$_}; delete($secondhash{$_}; } }

    No good deed goes unpunished. -- (attributed to) Oscar Wilde
      Thanks guys! Since these are membership lists, and I need to know who cancelled their membership (would be on first list, but not on second list), and who are the new members (would be on second list, but not on first). So, I would probably want something like this:
      foreach (keys %firsthash) { if (exists($secondhash{$_})) { delete($firsthash{$_}; } } foreach (keys %secondhash) { if (exists($firsthash{$_})) { delete($secondhash{$_}; } }
      The result would be that I would have cancelled members in the firsthash and new members in the second hash. Right?

        Yeah, I suppose that would work, too. The code sample I gave you would work (and is a little more efficient for large lists, since you're only iterating through the list of keys once, instead of twice.) Essentially you want to eliminate the intersection of the two hashes, I think -- and either way will work.

        Update: As you pointed out below, the first foreach removes elements from the first hash, so the second foreach fails to detect matches. Better stick with the single-pass solution. Thanks for pointing that out! :)

Re: Need advice on hashes and methods
by Velaki (Chaplain) on Aug 15, 2006 at 14:22 UTC

    Another simple way, without deleting all those keys, is simply to check for existence as in the code snippet, below. It's mostly comments. The code is small.

    #!/bin/perl use strict; use warnings; # We create two lists my @list1 = ( 1, 2, 3, 4, 5, 6, 7, 8, 9 ); my @list2 = ( 1, 3, 5, 7, 9, 11, 13, 15 ); # Let's convert them into hashes so # we can check for existence. We # set the value equal to 1, since # we don't need to count them. my %hash1 = map { $_ => 1 } @list1; my %hash2 = map { $_ => 1 } @list2; # Here's the fun part. Let's create an array/list # of all the keys in hash1 that aren't in hash2. # This is the same as items in list 1 that aren't # in list 2. my @list1_minus_list2 = grep { !exists $hash2{$_} } sort keys %hash1; # Let's do it again. Let's create an array/list # of all the keys in hash1 that aren't in hash1. # This is the same as items in list 2 that aren't # in list 1. my @list2_minus_list1 = grep { !exists $hash1{$_} } sort keys %hash2; # Let's print them out nicely. print join( ',', @list1_minus_list2 ), "\n"; print join( ',', @list2_minus_list1 ), "\n"; __END__ # Results 2,4,6,8 11,13,15

    Update:To find the intersection, change the !exists to exists. To create a union, simply create one hash from both lists with my %union_hash = map { $_ => 1 } (@list1,@list2);. Then just extract the keys with something like my @union_list = sort keys %union_hash;

    Hope this helped,
    -v.

    "Perl. There is no substitute."
Re: Need advice on hashes and methods
by liverpole (Monsignor) on Aug 15, 2006 at 14:09 UTC
    Hi tom2112,

    Here's another way you could do it, which yields 4 arrays at the end, the union of the hash keys, the keys found only in hash1, those found only in hash2, and the intersection (found in both hashes).

    I commented it fairly thoroughly in the hopes that it would help you during the initial stages of the Perl-learning process.

    use strict; use warnings; # Create two test hashes, with some matching keys, and some keys only +in # the first or second hash. # my %hash1 = ( 'a' => 1, 'b' => 2, 'd' => 3, 'f' => 4, 'h' => 5 ); my %hash2 = ( 'a' => 1, 'c' => 2, 'd' => 3, 'e' => 5, 'g' => 6 ); # Get the union of the two hashes, and save it in an array (@union) my %union = map { $_ => 1 } (keys %hash1, keys %hash2); my @union = keys %union; # Declare arrays for values only in %hash1, values only in %hash2, and # values found in both (the intesection). ## my (@hash1, @hash2, @inter); # For each item in the intersection, construct a flag which identifies # which array to save it in. # foreach (@union) { my $flag = (exists $hash1{$_}? 1: 0) + (exists $hash2{$_}? 2: 0); (1 == $flag) and push @hash1, $_; (2 == $flag) and push @hash2, $_; (3 == $flag) and push @inter, $_; } # Display the results print "[Results]\n"; printf "Union ........... %s\n", join(',', sort @union); printf "Intersection .... %s\n", join(',', sort @inter); printf "Only in hash1 ... %s\n", join(',', sort @hash1); printf "Only in hash2 ... %s\n", join(',', sort @hash2); __DATA__ [Results] Union ........... a,b,c,d,e,f,g,h Intersection .... a,d Only in hash1 ... b,f,h Only in hash2 ... c,e,g

    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
      Thanks!
Re: Need advice on hashes and methods
by injunjoel (Priest) on Aug 15, 2006 at 18:02 UTC
    Late to the party as usual but here is what I would do.
    #!/usr/bin/perl -w use strict; use Data::Dumper; #build some test data here. #create the lists of overlapping values #and unique values my @list_1 = (1 .. 20); my @list_2 = (10 .. 40); #define our sets my (%hash1, %hash2); #we are not interested in the values just #the keys for this exercise so we use a #hash slice, the @hash{@list_of_keys} #construct to build our sample hashes. @hash1{@list_1} = (); @hash2{@list_2} = (); #filter out our stuff. #I like do blocks since they return stuff. my @in1only = do{ #I localize %_ here and set it to %hash1 #you could also use my %temp_hash_for_this_scope #for example but the effect is the same. local %_ = %hash1; #again we see the hash slice thingie this time #we use it with delete which it just so happens #can work with a hash slice. Here we use it to #get rid of keys from %hash2 that might be in #our %hash1. delete @_{keys %hash2}; #finally we return a sorted list of the keys left #over from the delete call above. These are keys #unique to %hash1 only sort keys %_; }; #<---- don't forget the semi-colon for the do block!!!! #same as above... well kinda, just in reverse :) my @in2only = do{ local %_ = %hash2; delete @_{keys %hash1}; sort keys %_; }; print "in one only\n"; print Dumper(\@in1only); print "\n"; print "in two only\n"; print Dumper(\@in2only); print "\n";
    and the output...
    in one only $VAR1 = [ '1', '2', '3', '4', '5', '6', '7', '8', '9' ]; in two only $VAR1 = [ '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40' ];
    -InjunJoel

    "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo
Re: Need advice on hashes and methods
by BrianC (Acolyte) on Aug 16, 2006 at 04:19 UTC
    Another way:
    my %where; map {$where{$_} = 1} keys %firsthash; map {$where{$_} += 10} keys %secondhash; foreach (keys %where) { print "$_: first list only\n" if $where{$_} == 1; print "$_: second list only\n" if $where{$_} == 10; }
    Edited d/t typing too fast and not rereading. Thx injunjoel.