kvale has asked for the wisdom of the Perl Monks concerning the following question:

My goal is to compare two AoAs for equality, where the order of the elements of the main array and sub arrays is unimportant. Sort of an equivalence class over permutations. For example, these AoAs are equal:
$ha = [[1,2], [5,5,5], [1,2], [3,4]]; $hb = [[4,3], [1,2], [2,1], [5,5,5]];
Order is unimportant, so the data are set-like, but because there are repeated elements, we need a hierarchical multiset/bag comparison.

Pulling out Set::Bag, I create

use Set::Bag; my $bag_a = Set::Bag->new(); my $bag_b = Set::Bag->new();
and then get stuck, because elements must be strings, not anon arrays. So I create a hack: sort and serialize the arrays at each level:
my @ser_a; for my $aref (@$ha) { push @ser_a, join ',', sort {$a <=> $b} @$aref; } my @ser_b; for my $bref (@$hb) { push @ser_b, join ',', sort {$a <=> $b} @$bref; } print "They are "; unless ( (join ',', sort @ser_a) eq (join ',', sort @ser_b)) { print "not "; } print "equal\n";

This works, but costs sorts, stringifications, and joins. It is slow. Is there a faster and/or more elegant way to do this?

-Mark

Replies are listed 'Best First'.
Re: Comparing unordered hierarchical arrays
by tachyon (Chancellor) on Apr 06, 2004 at 06:52 UTC

    There are some efficincies to be had in the general approach. It is unclear if you want [ [1], [2] ] to be equal to [ [1,2] ]. Regardless the earlier you can fail the better for speed:

    my $ha = [[1,2], [5,5,5], [1,2,], [3,4]]; my $hb = [[4,3], [1,2], [2,1,1], [5,5,5]]; # flatten my @ha = map{ @$_ } @$ha; my @hb = map{ @$_ } @$hb; # fail immediately unless equal num elements. if ( @ha != @hb ) { print "Not equal num items!\n"; } else { # now no choice but to sort or do detail compare if ( (join ' ', sort @ha) eq (join ' ', sort @hb) ) { print "Equ +al\n" } else { print "Not equal\n" } }

    It is possible to use a hash to avoid the sort, even with duplicate values simply by counting occurences as implemented by BrowserUk

    cheers

    tachyon

Re: Comparing unordered hierarchical arrays
by Zaxo (Archbishop) on Apr 06, 2004 at 06:30 UTC

    Your idea of sorting first and then comparing is exactly right. There may be a cleverer way, but I don't know what it might be.

    Your "serialization" hack is probably not satisfactory. You will need iterative sort and comparison for this to work for arbitrary nested arrays. Recursion is an attractive way of writing that for not-too-deep nesting.

    After Compline,
    Zaxo

      Like this:
      sub serialize { my $r = shift; ref $r ? '(' . join( ',', sort map serialize($_), @$r ) . ')' +: $r } sub deep_eq { my( $x, $y ) = @_; serialize($x) eq serialize($y) }

      jdporter
      The 6th Rule of Perl Club is -- There is no Rule #6.

Re: Comparing unordered hierarchical arrays
by BrowserUk (Patriarch) on Apr 06, 2004 at 06:35 UTC

    Update: I misread your post. Sorry.

    This might be a little quicker as it avoids the sorting

    #! perl -slw use strict; sub cmpAoAs{ my( $a, $b ) = @_; my %h; $h{ $_ }++ for map{ @$_ } @$a; $h{ $_ }-- for map{ @$_ } @$b; return 0 == grep{ $_ } values %h; } my $ha = [ [1,2], [5,5,5], [1,2], [3,4] ]; my $hb = [ [4,3], [1,2], [2,1], [5,5,5] ]; print 'Same' if cmpAoAs $ha, $hb; __END__ P:\test>342841 Same

    However, on the surface of what you've told us, it loooks a little like your using the wrong structure if only the presence and count of a value (rather than it's position) determine equivalence. But I realise that I am not party to the full sp on your application:)


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail

      I don't think

      my $ha = [ [1,2], [5,5,5], [1,2], [3,4] ]; my $hb = [ [1,3], [5,5,5], [1,2], [2,4] ];
      are supposed to compare equal, as your function will return.

      After Compline,
      Zaxo

      Your solution seems flawed to me: consider
      my $ha = [ [1,2], [5,5,5], [1,2], [3,4] ]; my $hb = [ [4,3], [1,2], [2,1,5], [5,5] ];
      These are clearly different, but your code still outputs "same" since it is discarding the entire original structure (and not only the order of elements).
Re: Comparing unordered hierarchical arrays
by BUU (Prior) on Apr 06, 2004 at 08:42 UTC
    Heres my slightly different take on it. When I first saw this problem, I had an immediate desire to (ab)Data::Dumper's dumper sub to avoid having to do my own serialization/string conversion. In short I wanted to do Dumper($ha) eq Dumper($hb). Unfortunately to do so I had to write a brief routine that sorts the AoAs.

    First I iterate through the child arrays and sort each one, then I sort the parent array. After that it's a simple matter of doing Dumper($ha) eq Dumper($hb) which is what I wanted in the first place =].
    use strict; my $ha = [[1,2], [5,5,5], [1,2], [3,4]]; my $hb = [[4,3], [2,1], [1,2], [5,5,5]]; sub test { my ($ha, $hb) = @_; my ($tha,$thb) = ([],[]); for( @{ $ha } ) { push @{$tha},[ sort @{$_} ]; } $tha = [ sort { $a->[0] <=> $b->[0] } @{$tha} ]; for( @{ $hb } ) { push @{$thb},[ sort @{$_} ]; } $thb = [ sort { $a->[0] <=> $b->[0] } @{$thb} ]; use Data::Dumper; return Dumper($tha) eq Dumper($thb); } print test($ha,$hb);
    Note that it only handles AoA, and theres some ugly repeated code (the two for loops that sort the temporary array refs ), and basically no error checking. But it does seem to work =].
Re: Comparing unordered hierarchical arrays
by Enlil (Parson) on Apr 06, 2004 at 07:26 UTC
    Here is my crack at it:
    use strict; use warnings; #my $ha = [[1,2], [5,5,5], [1,2], [3,4]]; #my $hb = [[4,3], [1,2], [2,1], [5,5,5]]; my $ha = [ [1,2], [5,5,5], [1,2], [3,4] ]; my $hb = [ [1,3], [5,5,5], [1,2], [2,4] ]; if (cmpAoA($ha,$hb)) { print "THEY ARE THE SAME\n"; } else { print "THEY ARE THE DIFFERENT\n"; } sub cmpAoA { my @AoA = @_; return 0 if @{$AoA[0]} != @{$AoA[1]}; my %hash; foreach ( @{$AoA[0]} ) { $hash{ stringify($_) }++; } foreach ( @{$AoA[1]} ) { $hash{ stringify($_) }--; } return 0 == grep { $_ } values %hash; } sub stringify { return "[" . join(",", map{qq("$_")} sort { $a <=> $b } @$_ ). "]"; };

    -enlil

Re: Comparing unordered hierarchical arrays
by jweed (Chaplain) on Apr 06, 2004 at 08:46 UTC
    After much testing, I was unsuccessful, but you might want to check out Test::Deep as a framework for your comparison, with repeated commands to bag() in some hierarchical manner. The fact that it already includes tools for doing deep array comparisons could make it a strong tool in your favor.

    Update: Clarified last line.



    Code is (almost) always untested.
    http://www.justicepoetic.net/
      Better late than never! Test::Deep might be somewhat overkill but it certainly does the job
      use Test::Deep; my $ha = bag( bag(1,2), bag(5,5,5), bag(1,2), bag(3,4) ); my $hb = [ [4,3], [1,2], [2,1], [5,5,5] ]; print eq_deeply($hb, $ha) ? "same\n" : "different\n";
Re: Comparing unordered hierarchical arrays
by dragonchild (Archbishop) on Apr 06, 2004 at 12:08 UTC
    When I read this, my first thought was delegation. Have an object that will know whether its elements are the same as another object's elements. If an element is a scalar, compare it in a set-like fashion. If it's an object, delegate to that object. And, frankly, your "object" can simply be blessing the arrayref into a class with one method - the comparator. Something like:
    package Comparator; sub new { bless $_[1], $_[0] } sub comparator { # Some stuff here }

    Of course, the interesting bit is the # Some stuff here, and I'll look at implementing it later. Right now, I have interesting stuff to do and it's actually work-related! :-)

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose