jdelmedi has asked for the wisdom of the Perl Monks concerning the following question:

How do compare two arrays in making sure the contents in one exist in another.

Replies are listed 'Best First'.
Re: comparing two arrays
by rob_au (Abbot) on Nov 24, 2001 at 13:42 UTC
    In addition to the direction given by the mighty crazy one, I would also point you towards serialisation methods which subsequently allow you to compare complex data structures. The most notable of serialisation modules for Perl would be Data::Dumper and Storable - The latter of which can be used to compare data with greater speed than Data::Dumper with its methods more closely tied to the C-representation of these data objects.

    An example piece of comparison code using Storable might look like this:

    #!/usr/bin/perl use Storable qw/freeze/; use strict; $Storable::canonical = 1; my @x = ( '1', '2', '3', '4', '5' ); my @y = ( '1', '2', '3', '4', '5' ); my @z = ( '6', '7', '8', '9', '0' ); print "x = y\n" if (freeze(\@x) eq freeze(\@y)); # True print "x = z\n" if (freeze(\@x) eq freeze(\@z)); # False print "x != z\n" if (freeze(\@x) ne freeze(\@z)); # True

    Others familiar with this module will recognise the $Storable::canonical = 1 assignment as unnecessary in this example where arrays are being serialised - This assignment will allow Storable to store hashes with their elements sorted by their key, thereby allowing later comparison of the frozen serial structures.

     

    Ooohhh, Rob no beer function well without!

      Thank you very much for your input!! It has helped me to understand a little better.
(crazyinsomniac) Re: comparing two arrays
by crazyinsomniac (Prior) on Nov 24, 2001 at 11:36 UTC
Re: comparing two arrays
by davorg (Chancellor) on Nov 24, 2001 at 14:25 UTC
      Thanks for the suggestion, greatly appreciated.
Re: comparing two arrays
by George_Sherston (Vicar) on Nov 24, 2001 at 16:54 UTC
    I must say, in the nicest possible way, your question was a little terse. I can think of a lot of different situations you might have been interested in, many of which the above monks have dealt with. If, however, your situation was that you have an array like my @essentials = qw /1 3 5 7/; And you want to check an array @large to see whether @large contains at least all the contents of @essentials, then this wd be One Way To Do It:
    my @essentials = qw /1 3 5 7/; my @large = qw /0 2 3 4 5 6 7 8/; my $result = "OK"; for my $i (@essentials) { unless (grep {$i == $_} @large) { $result = "NO"; last; } } print $result; # prints NO my @essentials = qw /1 3 5 7/; my @large = qw /0 1 2 3 4 5 6 7 8/; my $result = "OK"; for my $i (@essentials) { unless (grep {$i == $_} @large) { $result = "NO"; last; } } print $result; # prints OK
    BUT... be careful with that == - you'll have to be careful to choose the right equality operator, depending on what's in your arrays.

    § George Sherston
      Thank you very much for your input, it has helped me to understand the methodology. I am quite new at this and can't seem to leave it alone unless I understand it. My question, if I may be a little more specific, I am dealing with a situation where I need to action something if for example a file with following contents; sqltable table loaded successfully with 282873 records. mike_test table loaded successfully with 282873 records. sarak_mike table loaded successfully with 282873 records. compares to a hard coded array @report=("sqltable", "mike_test ", "sarak_mike"); I need to check an array containing the file input @file to see whether @file contains at least all the contents of @report. Your help is greatly appreciated.
        No one has mentioned the Cookbook yet, so i will.

        If you need @file to contain at least all the elements of @report, then you need to find the simple difference of the two. The simple difference is the (paraphrasing from the Cookbook) "set of members of @report, but not of @file". If this set is empty, then @file contains at least all the elements of @report. So, working with Recipe 4.7:

        use strict; my @report = qw(sqltable mike_test sarak_mike); my @file = (@report, qw(plus a little more)); # off by 0 print '@file was off by ' . simple_compare(\@report,\@file) . " elements\n"; # still 0, that elem wasn't in @report pop @file; print '@file was off by ' . simple_compare(\@report,\@file) . " elements\n"; # off by 1, that one was shift @file; print '@file was off by ' . simple_compare(\@report,\@file) . " elements\n"; # returns number of elements in A that aren't in B sub simple_compare { my ($A,$B) = @_; # build lookup table my %seen = map { $_ => 1 } @$B; # find those in A that aren't in B my @aonly; foreach my $item (@$A) { push @aonly,$item unless ($seen{$item}); } return scalar @aonly; }
        (Array::Compare has a simple_compare() method, but it's definition of a simple compare is not the same as the Cookbook's.)

        Also note that this could only a be partial solution. What if you needed to compare these two arrays:

        my @report = qw(one one two); my @file = qw(one two two);
        Now what? @file definitely contains at least one of each element from @report. But, @file doesn't contain 2 'one's like @report does. If you need something like this, but you don't care about order, then use rob_au's suggestion and sort the arrays first. If you do care about order, use that code verbatim. Hope this helps!

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        F--F--F--F--F--F--F--F--
        (the triplet paradiddle)
        
Re: comparing two arrays
by jlongino (Parson) on Nov 24, 2001 at 22:06 UTC
    In the spirit of TIMTOWTDI:
    use strict; my @report = qw(sqltable mike_test sarak_mike); my @file = qw(some other stuff mike_test sarak_mike); my %in_file = map{$_ =>1} @file; my $all_match = 1; foreach (@report) { if (! exists $in_file{$_}) { $all_match = 0; print "\@report value: '$_' is not in \@file.\n"; } } print "All values in \@report found in \@file\n" if $all_match;

    --Jim

      I think I got overly excited - It's still not working The $_ in the hash seems to be blank and every run through comes back with the @report value does not exit in the @file. More help is surely appreciated.

      use strict;
      my $file='d:\mms_tableload.txt';
      my @file=();
      open(FILE,$file);
      @file = <FILE>;
      close(FILE);
      my @report = qw(sarak_mike mike);
      my %in_file = map{$_ =>1} @file;
      my $all_match = 1;
      foreach (@report) {
      if (! exists $in_file {$_} ) {
      $all_match = 0;
      print "\@report value: '$_' is not in \@file.\n";
      }
      }
      print "All values in \@report found in \@file\n" if
      $all_match;

        I've never seen the syntax  @file = ; (and I'm not doubting you, just curious) but are you checking to see what the contents of @file are before you enter the while loop? Also you should be sure to put  chomp @file; before the loop since you might have newlines in the file at the end of each record ("\n").

        Keep us posted.

        --Jim

        Afterthought: It is usually helpful when I'm debugging to use snippets similar to the following for testing arrays:

        my $i = 0; foreach (@file) { print "\@file[", $i++, "] = '$_'\n"; }
        It helps spot newlines, incorrect subscripting and such.

        Update: In case it is not clear from the previous posts, we're assuming that the file  'd:\mms_tableload.txt' looks like this:

        sarak_mike\n mike\n some\n other\n data\n
        The hash lookup must be an exact match (including newlines and case sensitivity). If there is other extraneous information like HTML tags embedded in @file, then you may need to use a different method that searches strings (like grep, or m/$_/), and this can be tricky since you'd have to insure that your search for "mike" doesn't match positve to "sarak_mike". </code>
      Now it made sence!!!!!! It workded in the real world, Hats off to you Sir!!!
        Understand that dupes are killed when hash is built. Though you did not specify that duplication of values is important, this test does not tell you that you have the same number of dupes of a given record in both record sets. Move SIG!