viveks_19 has asked for the wisdom of the Perl Monks concerning the following question:

hello, i read one column from two different flat files and stored the result into two associative arrays, now I have to comapare these two arrays. If I use foreach loop, it will be very inefficient as it will loop thru n*n times. if they dont have common values then I have to store into another array. I am not sure how to do that. can anyone suggest something.
  • Comment on How to compare two associative arrays efficiently

Replies are listed 'Best First'.
Re: How to compare two associative arrays efficiently
by ikegami (Patriarch) on Oct 12, 2006 at 20:57 UTC

    One column from each file, so only the keys need to be compared? Here's a solution that uses no memory and only takes N iterations. I've been told hash lookups for Perl hashes is usually constant-time, so it can be ignored.

    my $match = 0; # Need to run each to the end # so its safe to use again later. while (defined(my $key = each(%hash1))) { $match ||= exists $hash2{$k}; } if (!$match) { # No common ... }
Re: How to compare two associative arrays efficiently
by Corion (Patriarch) on Oct 12, 2006 at 20:49 UTC

    This is a faq.

    You can find this by running perldoc -q difference (or perldoc -q intersection) locally.

Re: How to compare two associative arrays efficiently
by jdporter (Paladin) on Oct 12, 2006 at 20:50 UTC
    my %count; $count{$_}++ for keys(%a1), keys(%a2); my @not_common = grep { $count{$_} < 2 } keys %count;
    We're building the house of the future together.
      thanks for your prompt reply,
      I was using the following code for comparision, I dont have any key or value pair and so that I compared all the entries (key and values). If they are not equal then I am pushing values from %pssn to an array. Please provide solution in this context.
      foreach $key(keys %pssn){ foreach $key1(keys %assn){ if ($key ne $key1){ push(@array,$key);} }} foreach $val(values %pssn){ foreach $val1(values%assn){ if ($ val ne $val1){ push(@array,$val);} }}

        Please use <c>...</c> around your code.

        Forget effeciency, your code doesn't even work. If both hashes are initialized to (a=>1, b=>2, c=>3), you end up with qw( b c a c a b 2 3 1 3 1 2 ) in @array instead of nothing.

        Update Ignore the remainder of this post based on new information provided in another post.

        The fix (assuming the values are strings) is below, although it's very weird that you want to merge the keys and the values.

        foreach my $key (keys %pssn) { if (not exists $assn{$key}) { push(@array, $key); } } foreach my $key (keys %assn) { if (not exists $pssn{$key}) { push(@array, $key); } } my %assn_vals = map { $_ => 1 } values %assn; my %pssn_vals = map { $_ => 1 } values %pssn; foreach my $val (keys %assn_vals) { if (not exists $pssn_vals{$val}) { push(@array, $val); } } foreach my $val (keys %pssn_vals) { if (not exists $assn_vals{$val}) { push(@array, $val); } }
Re: How to compare two associative arrays efficiently
by blazar (Canon) on Oct 12, 2006 at 21:12 UTC
    hello, i read one column from two different flat files and stored the result into two associative arrays,

    Associative arrays are for associating stuff, specifically keys with values. Now, independently of how you build them, we know that you got two such associative arrays. Supposedly the keys are entries from the first columns of each file, and other info missing one has to guess values are of no particular interest, and thus probably undef.

    now I have to comapare these two arrays.

    They're not arrays. They're associative arrays aka hashes. How do you want to compare them? Just find common keys?!?

    If I use foreach loop, it will be very inefficient as it will loop thru n*n times.

    You're saying more now, implying that both hashes have the same size. Indeed a hash must be though of as a mapping which also recalls the naive concept of a set which in turn recalls that of membership, thus it should come as no surprise that membership can be tested by means of straight features hiding the details from the programmer. Specifically chances are you just want to know about exists.

    if they dont have common values then I have to store into another array. I am not sure how to do that. can anyone suggest something.

    If they don't have common values (which values?!?) you have to store what into what? (really "another array" or another hash?)

      thanks, but again have a doubt. I got that my arrays(hashes) have only keys and values are undef. but my flat file is like 1111 2222 3333 4444 which I have loaded into an an hash say %hash1 and I was assuming that with this 1111 and 3333 will become the key for values 2222 and 4444 respectively, but I think I was wrong. So I have two flat files just like the above sample and which I have loaded into two separate hashes say %hash1,%hash2, and if they have any not common values/key then store into another array or hash, does not matter. Please help me out...

        It can be loaded either way. It's totally up to you how you store the info in the hash.

        Anyway, here's code that should accomplish your goal.

        use strict; use warnings; my $fn_assn = ...; my $fn_pssn = ...; my @assns; { open(my $fh_assn, '<', $fn_assn) or die("Unable to open assn file \"$fn_assn\": $!\n"); chomp(@assns = <$fh_assn>); } my @pssns; { open(my $fh_pssn, '<', $fn_pssn) or die("Unable to open pssn file \"$fn_pssn\": $!\n"); chomp(@pssns = <$fh_pssn>); } { my %pssns = map { $_ => 1 } @pssns; my @unique_assns = grep { not exists $pssns{$_} } @assns; print("The following ASSNs have no corresponding PSSNs:\n"); print("$_\n") foreach @unique_assns; } print("\n"); { my %assns = map { $_ => 1 } @assns; my @unique_pssns = grep { not exists $assns{$_} } @pssns; print("The following PSSNs have no corresponding ASSNs:\n"); print("$_\n") foreach @unique_pssns; }
        thanks, but again have a doubt. I got that my arrays(hashes) have only keys and values are undef. but my flat file is like 1111 2222 3333 4444 which I have loaded into an an hash say %hash1 and I was assuming that with this 1111 and 3333 will become the key for values 2222 and 4444 respectively, but I think I was wrong.

        Indeed if you do something like

        my %hash=qw/1111 2222 3333 4444/;

        then %hash will have the structure you describe. However generally keys and associated values have a "different nature". Nothing prohibits them to be homogeneus, but I see no reason for them to be so in this case, so is it really what you want?

        So I have two flat files just like the above sample and which I have loaded into two separate hashes say %hash1,%hash2, and if they have any not common values/key then store into another array or hash, does not matter. Please help me out...

        I wish I could help you but it's hard to make sense of "common values/keys". Also despite your efforts the expression "then store into another array or hash" is still missing the object to be stored into whatever.

Re: How to compare two associative arrays efficiently
by doowah2004 (Monk) on Oct 12, 2006 at 21:44 UTC
    I had something similar but it had to do with combinations of arrays... check this post maybe something there could help.

    Good luck.

    p.s. Add html tags to your posts to make them easier for others to read.
      Thank you so much guys for your kind help, I used the solution given by ikegami and it worked very well(and fast) for me. I found my answer and I am closing this thread. Thanks alot once again to all of you.
      my @unique_pssns = grep { not exists $assns{$_} } @pssns; print("The following PSSNs have no corresponding ASSNs:\n"); print("$_\n") foreach @unique_pssns;
        Thank you so much guys for your kind help, I used the solution given by ikegami and it worked very well(and fast) for me. I found my answer and I am closing this thread. Thanks alot once again to all of you.

        You can't (technically) "close" a thread. Indeed you solved your problem and that's fine. Now, one last remark is in order, in the hope that you will benefit from it: comparing your newly found solution

        my @unique_pssns = grep { not exists $assns{$_} } @pssns;

        with your initial attempt the lesson to be learned is that the choice of the data structure does matter in contrast with your assumption that "array or associative array doesn't matter". In particular you had homogeneous data "dispersed" across keys and values of given hashes with no sensible association between each key-value pair. Also, loading data from your flat files into hashes like that also will make you risk of losing some of it (specifically, different values associated to identical keys). For that naive two-loop approach an array would have been better. But when you have to check for existence, as a mnemonic remember of exists and think of a hash. This will get rid of one of the loops. Of course nobody prohibits you to have your main data stored into an array and to create a hash on the fly exactly for this purpose, as per ikegami's solution. As far as the other loop goes, it's still behind the curtain in grep but of course the latter as a higher level tool with a specific application highlights the logic and hides the details, not to mention the keystrokes it saves you.