Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Efficient Unique Nested Combinations

by FFRANK (Beadle)
on Jun 25, 2007 at 19:16 UTC ( [id://623229]=perlquestion: print w/replies, xml ) Need Help??

FFRANK has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
For getting only unique combinations of elements out of a AoA:
#!/usr/bin/perl -w use strict; use Algorithm::Loops qw(NestedLoops); my @symbol = ( [ 'a', 'b', 'c' ], [ 'a', 'b', 'c' ], [ 'a', 'b', 'c' ], [ 'c', 'd', 'e' ], ); my @all; my $iter = NestedLoops(\@symbol); my @list; while (@list = $iter->()) { my @sortedList = sort @list; my $listString = join ('',@sortedList); push @all, $listString; } my @uniq = keys %{{ map { $_ => 1 } @all }}; my @sortedUniq = sort (@uniq); foreach my $sortedUniq (@sortedUniq) { print $sortedUniq,"\n"; }
This is not very realistic for large AoA's, it is not efficient, and there is certainly a linear solution. A solution could look like (?) :
for my $symbolRow (0..$#symbol) { for my $symbolCol (0..$#{$matrix[$symbolRow]}) { # If, for elements 1..$#{$matrix[$symbolRow]} #the element (symbol) already exists in a preceeding line #remove from current line. # Then proceed with nested loops.
Could it be a good option to add to nestedLoops ?
Thanks for comments & hints, very best regards.

Replies are listed 'Best First'.
Re: Efficient Unique Nested Combinations
by Roy Johnson (Monsignor) on Jun 25, 2007 at 20:30 UTC
    You might be looking for a technique I discussed in NestedLoops and the Odometer Model:
    Now imagine that each time one of the dials changes, all the dials to the right of it can be remade. For example, each dial could exclude any numbers that exist on any dial to its left.
    That is, instead of looping over the static AoA, you can replace the arrayrefs with codrefs that filter out any elements that have already been used by outer loops.

    New update: In Re: Efficient Unique Nested Combinations, FFRANK points out that my "I think you want" is not the correct algorithm. A corrected solution is posted in reply thereto.
    Update: I think you just want to ensure that each element is greater-than-or-equal-to the previous one, so:

    #!perl use warnings; use strict; use Algorithm::Loops 'NestedLoops'; my @symbol = ( [ 'a', 'b', 'c' ], [ 'a', 'b', 'c' ], [ 'a', 'b', 'c' ], [ 'c', 'd', 'e' ], ); my $combos = NestedLoops([ $symbol[0], map { my $_hold = $_; sub { [grep {$_ ge $_[$#_]} @{$symbol[$_hold]}] } } 1..$#symbol ]); my @result; print "@result\n" while @result = $combos->();

    Caution: Contents may have been coded under pressure.
Re: Efficient Unique Nested Combinations
by clinton (Priest) on Jun 25, 2007 at 19:49 UTC
    I don't know anything about Algorithm::Loops, but given your code above, there is no need for the intermediate array @list. Instead, you can work out uniqueness as you go by using the keys of a hash:

    #!/usr/bin/perl -w use strict; use Algorithm::Loops qw(NestedLoops); my @symbol = ( [ 'a', 'b', 'c' ], [ 'a', 'b', 'c' ], [ 'a', 'b', 'c' ], [ 'c', 'd', 'e' ], ); my $iter = NestedLoops(\@symbol); my %seen; while (my @list = $iter->()) { my $listString = join ('',sort @list); $seen{$listString}++ } my @sortedUniq = sort (keys %seen); foreach my $sortedUniq (@sortedUniq) { print $sortedUniq,"\n"; }

    Clint

    UPDATE - Benchmarks added

    I compared the array method to the hash method, both for data that was all the same, and for data that was completely different, the results were:

    Rate array_diff array_same hash_diff hash_same array_diff 916/s -- -10% -16% -25% array_same 1012/s 11% -- -7% -18% hash_diff 1092/s 19% 8% -- -11% hash_same 1229/s 34% 21% 13% --

    Code for benchmarks:

Re: Efficient Unique Nested Combinations
by FFRANK (Beadle) on Jun 25, 2007 at 22:52 UTC
    Thank you all very much for helpful comments;
    I hope I was explicit enough. I meant to say that 'aab' is the same as 'baa', so to output 'aab' or 'baa' or 'aba', but not all, and to do this the most efficient way. Hereafter, "symbols" are replaced by short strings, and the problem remains the same. I believe so far the suggestion from Roy Johnson is the best:
Re: Efficient Unique Nested Combinations
by GrandFather (Saint) on Jun 25, 2007 at 21:50 UTC
    use strict; use warnings; use List::Compare; my @symbol = ( [ 'a', 'b', 'c' ], [ 'a', 'b', 'c' ], [ 'a', 'b', 'c' ], [ 'c', 'd', 'e' ], ); my $lCmp = List::Compare->new(@symbol); my @unique = $lCmp->get_symmetric_difference (); my @union =$lCmp->get_union (); print "Unique: @unique\n"; print "Union: @union\n";

    Prints:

    Unique: d e Union: a b c d e

    Update: completely missed the point!


    DWIM is Perl's answer to Gödel
Re: Efficient Unique Nested Combinations
by Moron (Curate) on Jun 26, 2007 at 11:15 UTC
    Algorithm::NestedLoops is a more general effort, whereas Math::Combinatorics is focused specifically on generating combinations and permutations - I would expect it to derive some performance from that specialisation, especially in the case of combinations which are a bit more difficult to iterate with their need to treat e.g. aab and aba as duplicate.
    __________________________________________________________________________________

    ^M Free your mind!

      True, except that M::C doesn't support the operation being described: combinations generated by taking one member from each of N (possibly) distinct sets.

      Caution: Contents may have been coded under pressure.
        Au contraire, M::C supports nCk and the described operation can be expressed as a two-deep explicit nested loop of nCk next-combination calls, the lower layer being nC1.

        Update: because selecting a single member from a set is the semi-degenerate nC1 case of an nCr.

        __________________________________________________________________________________

        ^M Free your mind!

Re: Efficient Unique Nested Combinations
by FFRANK (Beadle) on Jun 26, 2007 at 14:43 UTC
    When the "symbols" are not ordered in the initial AoA, it skips some results.
    my @symbols=( ['c','b','e'], ['a','f','d'], ); hash(\@symbols); ab ac ae bd bf cd cf de ef combo(\@symbols); c f c d b f b d e f
    The "greater than" solution thus assumes an order that is not neccessarily there in the initial AoA.
    Cheers
      I resorted to using a hash to catch what had been seen. The model still offers some short-circuiting, and is moderately faster than "hash" in my benchmarking (the more duplicates there are to avoid, the better "combo" does). Note also that you didn't change my code to use the passed-in array ref. I have done so here.
      sub combo { my $symbols = shift; my %seen; my $combos = NestedLoops([ $symbols->[0], map { my $_hold = $_; sub { [grep {!$seen{join(':', sort @_, $_)}++} @{$symbols->[$_hold +]}] } } 1..$#$symbols ]); while (my @result = $combos->()) { print "@result\n"; } print "\n"; }

      Caution: Contents may have been coded under pressure.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://623229]
Approved by shigetsu
Front-paged by shigetsu
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (7)
As of 2024-04-18 17:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found