http://qs1969.pair.com?node_id=434758

Inspired by Looking for help with AI::Genetic and classroom scheduling and Efficient Assignment of Many People To Many Locations?, I had a go at writing a GP solution to the latter. To see how things work, I made it run as a simple gui.

With command line of:

434412 -REPN=10 -LOCN=30 -S #prog people loc'tns (srand(1) for testing)

You get a display like this:

[ 12 85] 127.723 [ [ 42: 80][ 74: 10] ] [ 31 82] 156.39 [ [ 27: 27][ 57: 36][ 45: 94][ 58: 69] ] [ 34 60] 158.432 [ [ 48: 20][ 86: 20][ 83: 72] ] [ 47 47] 162.078 [ [ 78: 80][ 5: 60][ 53: 14][ 37: 9] ] [ 36 30] 173.694 [ [ 67: 5][ 87: 72][ 77: 84] ] [ 2 98] 173.723 [ [ 73: 60][ 95: 92] ] [ 0 53] 178.987 [ [ 15: 22][ 99: 99][ 26: 29] ] [ 5 0] 183.089 [ [ 45: 35][ 74: 46][ 46: 23] ] [ 0 36] 183.654 [ [ 16: 66][ 59: 38][ 84: 2] ] [ 0 19] 188.368 [ [ 61: 39][ 0: 91][ 51: 30] ] 000170 (000113) 1493.7 1686.14 #Cur'nt(Best ) Best Current #Iters (iters) score score

It is fun to watch the evolution process. You can stop the iteration using keystroke 's', then single step using the enter key. Using 's'<enter> will continue with a 1 second delay between iterations. Entering 'c', will return to full speed. 'q' exits displaying the best solution found and optionally (if you don't ^C), the frequency table and "best solutions" history.

It records the frequencies of the scores it finds, and the iteration counts at which the best score was improved upon, to allow further analysis.

In an attempt to avoid the "local minima" problem, it allows random mutation to continue for a settable limit ($EVO=1000) during which no improvement is found before restoring the best so far and then continuing from there.

It uses two types of mutation.

  1. If the worst scoring set of the current solution has more than one location in it, it rotates a location off the front of it and appends it to the end of the best set of the current solution.

    This is an attempt at intermingling the "genes" from the "best" and "worst".

  2. It picks two random set/location pairs and swaps them at each iteration.

I haven't decided upon a termination strategy, but hitting the $EVO limit say 3 times in a row might form one possibilty. Or maybe a time limit.

But the possibly the best would be to caclulate (roughly) the total number of solutions possible and then iterate for some small percentage of that and allow statistics to take over?

I saw ff assertion that the best of 30 random solutions has a high likelyhood of being a pretty good choice. My tests seem to bear that out, but with smallish sets (and the gui disabled) it iterates so fast that doing a few hundred thousands tries is no problem and it always results in an improvement.

#! perl -slw use strict; use Term::ReadKey; use Clone qw[ clone ]; use List::Util qw[ min reduce sum ]; $a=$b; $| = 1; use constant { X => 0, Y => 1, }; use constant { REP => 0, LOCNS => 1, SCORE => 2 }; our $GRID ||= '100:100'; ## X:Y of grid our @GRID = split ':', $GRID; our $REPN ||= 10; ## Number of representativ +es our $LOCN ||= $REPN * 3; ## Number of locations our $EVO ||= 1000; ## Evolution backtrack count our $S and srand( 1 ); ## allows comparison between + runs. print "Reps: $REPN Locations:$LOCN"; die "LOCN must be >= $REPN" unless $LOCN >= $REPN; sub show { ## Format sets for display system 'cls' if @_ > 1; for( @{ $_[ 0 ] } ) { printf "[%7s] %7g [ %s ]\n", "@{ $_->[REP] }", $_->[SCORE]||0.0, join '', map{ sprintf "[%3d:%3d]", @$_ } @{ $_->[LOCNS] }; } } sub pythagoras { ## calc distance between to points. my( $v1, $v2 ) = @_; my $dx = abs( $v1->[X] ) - abs( $v2->[X] ); my $dy = abs( $v1->[Y] ) - abs( $v2->[Y] ); return sqrt( $dx**2 + $dy**2 ); } ## Simple scoring. of individual sets ## Sum of distances of locations from rep location. sub score { my( $set ) = @_; return sum map { pythagoras( $set->[ 0 ], $_ ); } @{ $set->[ 1 ] } } my @reps = map { [ int rand rand $GRID[X], int rand $GRID[Y] ] } 1 .. +$REPN; my @locations = map { [ int rand $GRID[X], int rand $GRID[Y] ] } 1 .. +$LOCN; my @sets = map { [ $_, [ pop @locations ] ] } @reps; push @{ $sets[ rand @sets ][1] }, pop @locations while @locations; show( \@sets, 1 ); <STDIN>; my( $tries, $c ) = ( 0, 's' ); my $best = [ 9e99, [], 9 ]; my $evolution = 0; my( %scores, %best ); my( $delay, $display, $stop ) = ( -1, 1, 0 ); ReadMode 2; while( 1 ) { ## Caclulate the total score for the current sets. ## Sum of individual totals. my $totalScore = sum map { $_->[SCORE] = score $_ } @sets; ## Records frequencies of (integerised) solutions found $scores{ int $totalScore }++; ## Keep track of iterations $tries++; ## Commands to monitor progress and quit. $c = ReadKey( $delay )||''; $stop = 1 if $c eq 'q'; ## Quit $delay += 1 if $c eq 's'; ## speed (0=pause) (n>0 sleep n +) $delay = -1 if $c eq 'c'; ## Continue fullspeed $display = !$display if $c eq 'd'; ## Toggle display if( $best->[ 0 ] > $totalScore ) { ## If we found a better solutio +n $best = [ $totalScore, clone( \@sets ), $tries ]; ## save it $evolution = $EVO; ## but allow bad solution to evolve for +a while ## Keep a record of when we found improvements ## to allow estimates of "good enough" iterations; $best{ $tries } = $totalScore; } elsif( $stop or not --$evolution ) { ## if no better evolution after $EVO attempts @sets = @{ $best->[1] }; ## Restore the best yet and try again print 'Best restored'; Win32::Sleep 1000; } last if $stop; ## stop here after ensuring the best is restored. ## Sort them by individual scores @sets = sort{ $a->[SCORE] <=> $b->[SCORE] } @sets; ## Display them show \@sets, 1 if $display; printf "%06d (%06d) %g %g\n", $tries, $best->[ 2 ], $best->[ 0 ], $totalScore; ## If the worst set has more than 1, given one to the best push @{ $sets[ 0 ][LOCNS] }, shift @{ $sets[ -1 ][LOCNS] } if @{ $sets[ -1 ][LOCNS] } > 1; ## pick two set/location pairs at random my( $a, $b ) = map{ int rand @sets } 1 .. 2; my( $sa, $sb ) = ( int rand $#{ $sets[ $a ][LOCNS] }, int rand $#{ + $sets[ $b ][LOCNS] } ); ## and swap them my $temp = $sets[ $a ][LOCNS][ $sa ]||die "A:$a:$sa"; $sets[ $a ][LOCNS][ $sa ] = $sets[ $b ][LOCNS][ $sb ]||die "B:$b:$ +sb"; $sets[ $b ][LOCNS][ $sb ] = $temp; } show \@sets; ## display best solution printf "Total: %d after %d tries\n", $best->[ 0 ], $best->[ 2 ]; printf 'Enter to see a frequendy plot of the solutions found'; <STDIN> +; print "$_ => $scores{ $_ }" for sort{ $a <=> $b } keys %scores; printf 'Enter to see record of best discovery points.'; <STDIN>; print "Best score: $best{ $_ } after $_ iterations." for sort{ $a <=> +$b } keys %best;

Sample output

[ 0 19] 41.3038 [ [ 5: 60] ] [ 0 36] 45.0111 [ [ 45: 35] ] [ 47 47] 47.1699 [ [ 87: 72] ] [ 34 60] 49.7695 [ [ 53: 14] ] [ 0 53] 54.9181 [ [ 46: 23] ] [ 5 0] 66.9104 [ [ 16: 66] ] [ 12 85] 67.424 [ [ 51: 30] ] [ 2 98] 78.1025 [ [ 78: 80] ] [ 31 82] 82.8794 [ [ 86: 20] ] [ 36 30] 85.5862 [ [ 95: 92] ] 037234 (021726) 435.401 619.075 Best restored [ 0 53] 54.9181 [ [ 46: 23] ] [ 36 30] 15 [ [ 51: 30] ] [ 12 85] 83.2947 [ [ 95: 92] ] [ 0 36] 45.0111 [ [ 45: 35] ] [ 47 47] 47.1699 [ [ 87: 72] ] [ 0 19] 41.3038 [ [ 5: 60] ] [ 34 60] 49.7695 [ [ 53: 14] ] [ 31 82] 82.8794 [ [ 86: 20] ] [ 2 98] 78.1025 [ [ 78: 80] ] [ 5 0] 66.9104 [ [ 16: 66] ] Total: 435 after 21726 tries Enter to see a frequendy plot of the solutions found 435 => 3 441 => 3 442 => 2 443 => 1 447 => 1 448 => 1 450 => 3 453 => 4 454 => 3 455 => 1 456 => 1 457 => 3 459 => 3 461 => 2 462 => 1 463 => 3 464 => 4 465 => 3 466 => 4 467 => 9 468 => 6 469 => 8 470 => 6 471 => 6 472 => 7 473 => 12 474 => 4 475 => 8 476 => 12 477 => 10 478 => 10 479 => 15 480 => 13 481 => 12 482 => 13 483 => 9 484 => 20 485 => 8 486 => 17 487 => 22 488 => 21 489 => 22 490 => 21 491 => 29 492 => 30 493 => 23 494 => 27 495 => 40 496 => 34 497 => 39 498 => 34 499 => 42 500 => 31 501 => 43 502 => 64 503 => 49 504 => 53 505 => 44 506 => 55 507 => 72 508 => 63 509 => 72 510 => 58 511 => 59 512 => 60 513 => 84 514 => 63 515 => 59 516 => 78 517 => 83 518 => 77 519 => 105 520 => 94 521 => 83 522 => 91 523 => 110 524 => 107 525 => 101 526 => 134 527 => 125 528 => 119 529 => 110 530 => 134 531 => 122 532 => 113 533 => 125 534 => 152 535 => 157 536 => 170 537 => 153 538 => 173 539 => 153 540 => 194 541 => 181 542 => 149 543 => 179 544 => 188 545 => 172 546 => 192 547 => 210 548 => 207 549 => 213 550 => 204 551 => 212 552 => 235 553 => 254 554 => 248 555 => 185 556 => 268 557 => 242 558 => 256 559 => 247 560 => 258 561 => 258 562 => 241 563 => 290 564 => 281 565 => 288 566 => 318 567 => 306 568 => 268 569 => 306 570 => 275 571 => 281 572 => 268 573 => 307 574 => 307 575 => 299 576 => 254 577 => 293 578 => 324 579 => 284 580 => 299 581 => 304 582 => 303 583 => 322 584 => 333 585 => 342 586 => 323 587 => 329 588 => 333 589 => 312 590 => 331 591 => 274 592 => 314 593 => 349 594 => 318 595 => 303 596 => 290 597 => 318 598 => 323 599 => 288 600 => 272 601 => 329 602 => 331 603 => 335 604 => 304 605 => 312 606 => 328 607 => 311 608 => 315 609 => 290 610 => 280 611 => 320 612 => 272 613 => 267 614 => 298 615 => 307 616 => 260 617 => 280 618 => 283 619 => 346 620 => 299 621 => 253 622 => 262 623 => 264 624 => 251 625 => 230 626 => 226 627 => 254 628 => 239 629 => 260 630 => 231 631 => 245 632 => 226 633 => 217 634 => 244 635 => 206 636 => 221 637 => 226 638 => 180 639 => 176 640 => 206 641 => 175 642 => 180 643 => 161 644 => 166 645 => 203 646 => 167 647 => 179 648 => 163 649 => 155 650 => 125 651 => 146 652 => 128 653 => 130 654 => 121 655 => 137 656 => 127 657 => 121 658 => 122 659 => 120 660 => 100 661 => 115 662 => 93 663 => 119 664 => 102 665 => 109 666 => 107 667 => 90 668 => 87 669 => 88 670 => 90 671 => 63 672 => 69 673 => 74 674 => 69 675 => 45 676 => 57 677 => 61 678 => 53 679 => 49 680 => 62 681 => 54 682 => 53 683 => 41 684 => 50 685 => 49 686 => 50 687 => 41 688 => 31 689 => 44 690 => 30 691 => 37 692 => 36 693 => 36 694 => 42 695 => 39 696 => 29 697 => 19 698 => 26 699 => 28 700 => 28 701 => 30 702 => 24 703 => 22 704 => 22 705 => 16 706 => 20 707 => 15 708 => 13 709 => 13 710 => 13 711 => 14 712 => 9 713 => 8 714 => 3 715 => 8 716 => 4 717 => 1 718 => 2 719 => 9 720 => 3 721 => 5 722 => 3 723 => 3 724 => 9 725 => 3 727 => 4 728 => 1 729 => 2 730 => 3 732 => 4 733 => 4 734 => 2 735 => 1 736 => 1 739 => 2 740 => 2 741 => 1 742 => 3 743 => 1 744 => 1 746 => 1 Enter to see record of best discovery points. Best score: 587.754090215359 after 1 iterations. Best score: 585.739406140003 after 2 iterations. Best score: 574.349845629305 after 7 iterations. Best score: 539.378581196848 after 9 iterations. Best score: 515.579334272599 after 25 iterations. Best score: 503.180579975418 after 26 iterations. Best score: 482.804864664163 after 98 iterations. Best score: 481.914466880709 after 99 iterations. Best score: 469.204365706502 after 623 iterations. Best score: 468.414887089937 after 1607 iterations. Best score: 454.855761343826 after 1846 iterations. Best score: 442.160363943557 after 2281 iterations. Best score: 442.160363943557 after 2282 iterations. Best score: 441.218607935927 after 2283 iterations. Best score: 435.401484875951 after 21726 iterations.

Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.

Replies are listed 'Best First'.
Re: Evolution in Action.
by kvale (Monsignor) on Feb 26, 2005 at 16:27 UTC
    Cool program!

    The evolution of the score you see is pretty typical of genetic algorithms and genetic programs: much progress is made in the initial iterations, but it takes a lot more to gain further improvement.

    I just wanted to mention that the assertion by ff's professor that you only need 30 tries is patently false. If a problem has many degrees of freedon and a complex error surface, in general the amount of searching one must do is O(k^N), where N is the number of degrees of freedom and k is related to the complexity of the error surface.

    To see this, consider a simple example. Suppose we one have degree of freedom and we know that a good solution lies either at x = -1 or x = 1. Then we only have to search twice. Now add a similar degree of freedon y that interacts with x in a complex way. Then we would need to search (+-1, +-1), or 4 attempts. Adding another would need 8 attempts, and so on, yielding O(2^N) behavior. An this is just looking for a good quadrant! This blowup of search space with degrees of freedom is called the 'curse of dimensionality' and along with a complex error surface, is the basic reason that some problems are NP-complete.

    Some problems have some nice large scale structure of the error surface that admit heurisitic guesses that often do well, but others, like k-satisfiability for k > 2, don't seem to. So don't throw away your simulated annealing and genetic algorithms code yet!

    -Mark