in reply to Re: find combinations
in thread find combinations

Unfortunately the benchmark is misleading. You are timing how long it takes tye's solution to return an iterator that can give you the full set, not the amount of time it takes that iterator to return the full set like you are for your solution and my solution. Anyway, heres a modified benchmark that compares apples to apples. (Although I admit I tweaked my code slightly to not pass so much on the stack.)

Also, I normalized and/or simplified some of the other code involved, and changed the way the %mechs has gets passed into your code. Bottom subbing has the disadvantage that you get accidental globals, which can cause trouble when you convert things to a module. Its good practice to top-sub for this reason.

use strict; use warnings; use Math::Combinatorics; use Algorithm::Loops qw( NestedLoops ); use Data::Dumper; use List::Util qw(max); use Benchmark qw(:all) ; sub find_lances_holli { my $mechs = shift; my $players = shift; my $weight = shift; my $treshold = shift; my @all_lances = Math::Combinatorics::combine ($players, @_); my @valid_lances; for my $lance ( @all_lances ) { my $sum; $sum += $mechs->{$_}{weight} for @{$lance}; push @valid_lances, { names=>$lance, weight=>$sum } if ($weight - $sum <= $treshold) && ($weight - $sum) >= 0 +; } return @valid_lances; } sub find_lances_tye { my( $hvMechs, $members, $maxWeight, $treshold )= @_; my @mechs= keys %$hvMechs; my $iter= NestedLoops ( [ [0..$#mechs], (sub{[$_+1..$#mechs]})x($members-1) ], { OnlyWhen => sub { return if @_ != $members; my $weight= 0; $weight += $_->{weight} for @$hvMechs{@mechs[@_]}; #return $weight == $maxWeight; return 1 if ($maxWeight - $weight <= $treshold) && ($maxWei +ght - $weight) >= 0 ; }, } ); my $getit= sub { my @idx= $iter->(); my $w = 0; return unless @idx; $w+=$hvMechs->{$_}->{weight} for @mechs[ @idx ]; return { weight=>$w, names => [@mechs[ @idx ]] } if @idx; }; my (@ret,$item); push @ret,$item while $item=$getit->(); return @ret; } # make_choose_iter( $choose, @list ) # ------------------------------------------------------ # Returns an anonymous subroutine to act as an iterator # over the set of combinations, returning each in turn # every time it is called. If the iter is called with an # argument or the list has been exhausted on a previous run # it resets itself and returns undef. # sub make_choose_iter { my ( $choose, $list )= @_; die "Can't do ".(0+@$list)." choose ".($choose||0)."\n" if $choose>@$list or !@$list or !$choose; $choose--; # 1 based number passed in, but need it 0 based # done is used as a flag, and @idx is the first n positions my ( $done, @idx )= ( 0, 0 .. $choose ); return sub { if ( $done || @_ ) { # if we finished the list last call, or they are # forcing a reset via an argument # then we reset everthing and return undef/empty list ($done , @idx)= ( 0, 0 .. $choose ); return; } # we get the apropriate elements of the list for the # current state of @idx. this is what we will return my @ret= @{$list}[@idx]; # assume that this is the last element. $done= 1; # Loop over the elements of @idx from the right # looking for the correct element: # We are simulating the behaviour of N # nested loops. This means we increment the # rightmost element whose increment does not # overflow and whose final value is sufficiently # low that the elements following it can be filled # with its successors without also overflowing for ( my $i= $choose ; $i >= 0 ; $i-- ) { if ( #$idx[$i] < $#$list #&& (my $end= $idx[$i] + 1 + $choose - $i) < @$list ) { $idx[$i]++; @idx[ $i + 1 .. $choose ]= ( $idx[$i] + 1 .. $end ); $done= 0; last; } } return wantarray ? @ret : \@ret; }; } sub find_lances_demerphq { my ($mechs,$count,$weight,$treshold)=@_; #print Dumper ($mechs);<STDIN>; my $iter= make_choose_iter( $count, [values %$mechs] ); my @ret; while (my $items=$iter->()) { my $sum=0; #print Dumper ($items);<STDIN>; $sum+=$_->{weight} for @$items; if (($weight - $sum <= $treshold) && ($weight - $sum) >= 0) { my $combo = { weight => $sum, names => [ map { $_->{name} +} @{$items} ] }; push @ret, $combo } } return @ret; } my %mechs; my @holli; my @tye; my @demerphq; my $iter; while (<DATA>) { chomp; my ($mech, $weight, $faction) = split /\s+/; $mechs{$mech} = { name => $mech, weight => $weight, faction => $fa +ction }; } my $candidates={ 'tye' => sub { @tye = find_lances_tye ( \%mechs, 5, 240, +10 ) }, 'holli' => sub { @holli = find_lances_holli ( \%mechs, 5, 240, + 10, keys %mechs ) }, 'demq' => sub { @demerphq = find_lances_demerphq ( \%mechs, 5 +, 240, 10 ) }, }; timethese(-10, $candidates); cmpthese(-10, $candidates); for my $array (\@holli,\@tye,\@demerphq) { @$array= sort map { join (",", sort @{$_->{names}}) . "(" . $_->{weight} . + " tons)\n" } @$array; } my $ok = join ("", @holli) eq join ("", @tye) && join ("", @demerphq) eq join ("", @tye) ? "" : " not"; print "output$ok identical\n"; __DATA__ Flea 20 clan Commando 25 clan UrbanMech 30 clan Hollander 35 clan Jenner 35 clan Raven 35 clan Wolfhound 35 clan BlackHawk 50 clan Hunchback 50 clan Rifleman 60 clan Catapult 65 clan Loki 65 clan Thor 70 clan MadCat 75 clan Gargoyle 80 clan Victor 80 clan Zeus 80 clan Longbow 85 clan Warhawk 85 clan Mauler 90 clan Atlas 100 clan

And it shows my code beating tyes by just a tiny bit. Repeated attempts show the difference is pretty consistant so I dont think this is noise.

Benchmark: running demq, holli, tye for at least 10 CPU seconds... demq: 17 wallclock secs (10.31 usr + 0.02 sys = 10.33 CPU) @ 2 +.71/s (n=28) holli: 19 wallclock secs (11.73 usr + 0.00 sys = 11.73 CPU) @ 0 +.34/s (n=4) tye: 15 wallclock secs (10.14 usr + 0.03 sys = 10.17 CPU) @ 2 +.06/s (n=21) Rate holli tye demq holli 0.335/s -- -85% -87% tye 2.18/s 550% -- -18% demq 2.66/s 695% 22% -- output identical

Anyway, let this be a lesson to you. When the disparity of the benchamrks is so high look for a mistake in the code being benchmarked, especially when you get warnings about insufficient iterations. (And its generally better to use time for the benchmark and not iterations.)

---
$world=~s/war/peace/g

Replies are listed 'Best First'.
Re^3: find combinations
by holli (Abbot) on Sep 07, 2005 at 12:25 UTC
    I stand corrected. I changed my benchmark code to
    timethese (-10, { 'tye' => sub { $iter = find_lances_tye ( \%mechs, 5, 240, 10 ); while ( my $lance = $iter->() ) { push @tye, join (",", sort @{$lance->{names}}) . "(" . + $lance->{weight} . " tons)\n" ; } @tye = sort @tye; }, 'holli' => sub { @holli = find_lances_holli (5, 240, 10, keys %mechs ); @holli = map { join (",", sort @{$_->{names}}) . "(" . $_- +>{weight} . " tons)\n" } @holli; @holli = sort @holli; }, 'demerphq' => sub { @demerphq = find_lances_demerphq ( \%mechs, 5, 240, 10 ); @demerphq = map { join (",", sort @{$_->{names}}) . "(" . +$_->{weight} . " tons)\n" } @demerphq; @demerphq = sort @demerphq; }, } );
    so it constructs the same data-structure in all cases and then I get
    C:\>benchmark.pl Benchmark: running demerphq, holli, tye for at least 10 CPU seconds... demerphq: 10 wallclock secs (10.41 usr + 0.02 sys = 10.42 CPU) @ 1 +.44/s (n=15) holli: 12 wallclock secs (12.23 usr + 0.00 sys = 12.23 CPU) @ 0 +.33/s (n=4) tye: 10 wallclock secs (10.33 usr + 0.02 sys = 10.34 CPU) @ 1 +.84/s (n=19)
    Pretty similar to your outcome. Thanks for the clarification.


    holli, /regexed monk/

      Unfortunately your benchmark is still not what I would consider to be "fair". tyes solution _still_ returns an iterator which you then unpack in the benchmark sub. Wheras your solution and my solution have to return the list on the stack twice. This has an observable cost, especially for large lists like the ones we are dealing with. IMO the three variants of find_lances_XXX() should have identical API's, i.e. they should take the same arguments, and return the same type of list. Anythign else means that you are benchmarking different behaviour and unconsciously weighting some of the solutions different from the others.

      Another problem with your reformulation is that you have put the list normalization routine inside the benchmark. IMO this is undesirable as it adds the cost of normalizing the list to the runtime of the solutions, when the normalization code is not actually needed for the routine to meet the specification given. Also, sorting is very sensitive to intitial conditions, so this may also subtly favour one of the solutions.

      Also if you are going to benchmark the code, please use my updated code as the original posting contains some unnecessary stuff, and uses the stack for simplicity when for speed you would normally use pass by reference.

      ---
      $world=~s/war/peace/g