rsiedl has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I'm trying to compare two arrays and find their "intersection". List::Compare has so far proved very useful, however I now need a non-unique intersection.
Example code:
#!/usr/bin/perl use strict; use List::Compare; my @temp = ( 'test', 'test1', 'test', 'test2', 'test2', 'test' ); print "----------------------\nContents of Temp Array\n"; print "\t", $_, "\n" foreach (@temp); my @temp2 = ( 'test1', 'test2', 'test2' ); print "----------------------\nContents of Temp2 Array\n"; print "\t", $_, "\n" foreach (@temp2); my $lc = List::Compare->new('--unsorted', \@temp, \@temp2); my @intersection = $lc->get_intersection; print "----------------------\nThe (Unique) Intersection of Temp and T +emp2 Arrays\n"; print "\t", $_, "\n" foreach (@intersection); exit;
This returns:
----------------------
Contents of Temp Array
        test
        test1
        test
        test2
        test2
        test
----------------------
Contents of Temp2 Array
        test1
        test2
        test2
----------------------
The (Unique) Intersection of Temp and Temp2 Arrays
        test1
        test2
However I need the Intersection to look like:
The Intersection of Temp and Temp2 Arrays
        test1
        test2
        test2
as test2 appeared twice in both lists.

I cant find a way to do this with List::Compare, so can anyone suggest anything?

Cheers,
Reagen

Update: I've figured out a way to do it As pointed out by Sidhekin, this doesnt work.
my %temp=map{$_ =>1} @temp; my %temp2=map{$_=>1} @temp2; my @non_unique_intersection = grep( $temp{$_}, @temp2 ); print "----------------------\nThe Non-Unique Intersection of Temp and + Temp2 Arrays\n"; print "\t", $_, "\n" foreach (@non_unique_intersection);
but would still be interested in how you would go about it...

For bonus points can anyone suggest a way to get the following:
- The non-unique leftover from @temp; and,
- The non-unique leftover from @temp2.

I tried the following which doesnt work...
my @non_unique_Lonly = @temp; for (my $i=0;$i<scalar(@non_unique_Lonly);$i++){ delete $non_unique_Lonly[$i] if ( grep $_ eq $non_unique_Lonly[$i], +@non_unique_intersection ); } # end-for print "----------------------\nThe Non-Unique Leftover from Temp\n"; print "\t", $_, "\n" foreach (@non_unique_Lonly);
Update 2: This works but seems long winded...
my @non_unique_Lonly = @temp; my @temp_intersection = @non_unique_intersection; for (my $i=0;$i<scalar(@non_unique_Lonly);$i++){ for (my $j=0;$j<scalar(@temp_intersection);$j++){ if ( grep $_ eq $non_unique_Lonly[$i], @temp_intersection ) { delete $non_unique_Lonly[$i]; delete $temp_intersection[$j]; last; } # end-if } # end-for } # end-for print "----------------------\nThe Non-Unique Leftover from Temp\n"; print "\t", $_, "\n" foreach (@non_unique_Lonly);

Replies are listed 'Best First'.
Re: Compare Lists (Non Unique Intersection)
by davido (Cardinal) on Jul 28, 2006 at 06:23 UTC

    Here's a rough draft (working properly, however) of one way to do it. I'm using a couple of hashes to reduce each array to simply a count of each element. Then the lesser of the two counts becomes the number of times set three should contain a given element.

    use strict; use warnings; use List::Util qw/min/; my @list_a = qw/test test1 test2 test2 test/; my @list_b = qw/test1 test2 test2/; my( %element_count_a, %element_count_b ); %element_count_a = %{ count( \@list_a ) }; %element_count_b = %{ count( \@list_b ) }; foreach my $key ( keys %element_count_a ) { foreach ( 1 .. min( $element_count_a{$key}, exists( $element_count_b{$key} ) ? $element_count_b{$key} : '0' ) ) { print $key, "\n"; } } sub count { my( $list_aref ) = shift; my %element_count; foreach my $item ( @{ $list_aref } ) { $element_count{$item}++; } return \%element_count; }

    Updated to choose better variable names.


    Dave

Re: Compare Lists (Non Unique Intersection)
by Sidhekin (Priest) on Jul 28, 2006 at 06:24 UTC

    I don't think the "way to do it" of your update really does it, if I understand correctly what "it" is ... that way would result in two copies of any element that was twice in @temp2, whether it was once, twice, or three or more times in @temp.

    Provided I understand correctly what "it" is, I'd count up and down:

    my %count; $count{$_}++ for @temp; my @non_unique_intersection = grep { $count{$_}-->0 } @temp2;

    This will preserve the order of elements of @temp2, until the count (initially the frequency in @temp of that element's string equivalence class) has been exhausted.

    Update: Might as well make a function, while I'm at it:

    # my @non_unique_intersection = nonunique_intersect( \@temp, \@temp2 ) +; sub nonunique_intersect { my ($x, $y) = @_; my %count; $count{$_}++ for @$x; my @return = grep { $count{$_}-->0 } @$y; return wantarray ? @return : \@return; }

    Update 2: For your next challenge, the non-unique intersection, leftovers from @temp, and leftovers from @temp2; assuming that the bag of ( @temp, @temp2 ) is supposed to be the same as the bag of ( (@non_unique_intersection) x 2, @leftovers, @leftovers2 ):

    my %count; $count{$_}++ for @temp; my @non_unique_intersection = grep { $count{$_}-->0 } @temp2; my (@leftovers, @leftovers2); push @{$count{$_}>0 ? \@leftovers : \@leftovers2}, ($_) x abs($count{$ +_}) for keys %count;

    print "Just another Perl ${\(trickster and hacker)},"
    The Sidhekin proves Sidhe did it!

      how right you are! thanks for pointing that out. ++
Re: Compare Lists (Non Unique Intersection)
by crashtest (Curate) on Jul 28, 2006 at 06:23 UTC

    I am not an expert on List::Compare, but my instinct would be that it doesn't handle your requirement. My spotty mathematics background associates the word "intersection" with "set theory". And sets (unlike lists) consist by definition of unique elements.

    That being said, it would be possible to roll your own "non-unique" intersection using loops, like so*:

    my @results = nonunique_intersect(\@temp, \@temp2); # ... sub nonunique_intersect{ my ($a1, $a2) = @_; my @unused_list = @$a2; my @results = (); for (@$a1){ # For each element in the first array... for (my $i = 0; $i < @unused_list; $i++){ # Search for it in # the 2nd array... if ($unused_list[$i] eq $_){ # if found, add to results, # remove from unused list. push @results, splice @unused_list, $i, 1, (@unused_list[$i+1 .. $#unused_list]); last; } } } return @results; }
    YMMV.

    *Tested only with your example sets!

Re: Compare Lists (Non Unique Intersection)
by GrandFather (Saint) on Jul 28, 2006 at 06:42 UTC

    Another way to do it since davido implemented my first idea:

    use warnings; use strict; my @array1 = qw( test test1 test test2 test2 test ); my @array2 = qw( test1 test2 test2 ); my @result; @array1 = sort @array1; @array2 = sort @array2; while (@array1 && @array2) { if ($array1[0] lt $array2[0]) { shift @array1; } elsif ($array2[0] lt $array1[0]) { shift @array2; } else { push @result, shift @array1; shift @array2; } } print join "\n", @result;

    Prints:

    test1 test2 test2

    DWIM is Perl's answer to Gödel
Re: Compare Lists (Non Unique Intersection)
by Mandrake (Chaplain) on Jul 28, 2006 at 08:42 UTC
    For the request in the Update - non unique leftovers. Make changes to the function calls (refer to my previous post)
    my (@non_unique_temp,@non_unique_temp2) ; (!in_array($_,\@temp)) && (push @non_unique_temp2, $_) for (@temp2); (!in_array($_,\@temp2)) && (push @non_unique_temp, $_) for (@temp); sub in_array { ($_ eq $_[0]) && (return $_) for (@{$_[1]}) ; }

    And here is the unique leftovers.
    my (@unique_temp,@unique_temp2) ; (!in_array($_,[@temp, @unique_temp2])) && (push @unique_temp2, $_) for + (@temp2); (!in_array($_,[@temp2,@unique_temp])) && (push @unique_temp, $_) for ( +@temp);
Re: Compare Lists (Non Unique Intersection)
by Mandrake (Chaplain) on Jul 28, 2006 at 06:43 UTC
    Makes me think in this way. Not sure if I miss anything.
    my @non_unique; (in_array($_,\@temp)) && (push @non_unique, $_) for (@temp2); sub in_array { ($_ eq $_[0]) && (return $_) for (@{$_[1]}) ; }
    Thanks.
Re: Compare Lists (Non Unique Intersection)
by jwkrahn (Abbot) on Jul 28, 2006 at 10:53 UTC
    This is basicly the same as davido's but without the module:
    #!/usr/bin/perl use warnings; use strict; my @temp1 = qw( test test1 test test2 test2 test ); print "----------------------\nContents of Temp1 Array\n"; print map "\t$_\n", @temp1; my @temp2 = qw( test1 test2 test2 ); print "----------------------\nContents of Temp2 Array\n"; print map "\t$_\n", @temp2; my %temp1; $temp1{ $_ }++ for @temp1; my %temp2; $temp2{ $_ }++ for @temp2; print "----------------------\nThe (Unique) Intersection of Temp1 and +Temp2 Arrays\n"; for my $key ( grep exists( $temp2{ $_ } ), keys %temp1 ) { my $count = $temp2{ $key } < $temp1{ $key } ? $temp2{ $key } : $te +mp1{ $key }; print map "\t$_\n", ( $key ) x $count; }