Evolution in Action.

Inspired by Looking for help with AI::Genetic and classroom scheduling and Efficient Assignment of Many People To Many Locations?, I had a go at writing a GP solution to the latter. To see how things work, I made it run as a simple gui.

With command line of:

434412 -REPN=10 -LOCN=30 -S
#prog   people   loc'tns (srand(1) for testing)
[download]

You get a display like this:

[  12 85] 127.723 [ [ 42: 80][ 74: 10] ]
[  31 82]  156.39 [ [ 27: 27][ 57: 36][ 45: 94][ 58: 69] ]
[  34 60] 158.432 [ [ 48: 20][ 86: 20][ 83: 72] ]
[  47 47] 162.078 [ [ 78: 80][  5: 60][ 53: 14][ 37:  9] ]
[  36 30] 173.694 [ [ 67:  5][ 87: 72][ 77: 84] ]
[   2 98] 173.723 [ [ 73: 60][ 95: 92] ]
[   0 53] 178.987 [ [ 15: 22][ 99: 99][ 26: 29] ]
[    5 0] 183.089 [ [ 45: 35][ 74: 46][ 46: 23] ]
[   0 36] 183.654 [ [ 16: 66][ 59: 38][ 84:  2] ]
[   0 19] 188.368 [ [ 61: 39][  0: 91][ 51: 30] ]
000170 (000113) 1493.7 1686.14

#Cur'nt(Best )  Best   Current
#Iters (iters)  score  score
[download]

It is fun to watch the evolution process. You can stop the iteration using keystroke 's', then single step using the enter key. Using 's'<enter> will continue with a 1 second delay between iterations. Entering 'c', will return to full speed. 'q' exits displaying the best solution found and optionally (if you don't ^C), the frequency table and "best solutions" history.

It records the frequencies of the scores it finds, and the iteration counts at which the best score was improved upon, to allow further analysis.

In an attempt to avoid the "local minima" problem, it allows random mutation to continue for a settable limit ($EVO=1000) during which no improvement is found before restoring the best so far and then continuing from there.

It uses two types of mutation.

If the worst scoring set of the current solution has more than one location in it, it rotates a location off the front of it and appends it to the end of the best set of the current solution.
This is an attempt at intermingling the "genes" from the "best" and "worst".
It picks two random set/location pairs and swaps them at each iteration.

I haven't decided upon a termination strategy, but hitting the $EVO limit say 3 times in a row might form one possibilty. Or maybe a time limit.

But the possibly the best would be to caclulate (roughly) the total number of solutions possible and then iterate for some small percentage of that and allow statistics to take over?

I saw ff assertion that the best of 30 random solutions has a high likelyhood of being a pretty good choice. My tests seem to bear that out, but with smallish sets (and the gui disabled) it iterates so fast that doing a few hundred thousands tries is no problem and it always results in an improvement.

#! perl -slw
use strict;
use Term::ReadKey;
use Clone qw[ clone ];
use List::Util qw[ min reduce sum ]; $a=$b;

$| = 1;

use constant { X => 0, Y => 1, };
use constant { REP => 0, LOCNS => 1, SCORE => 2 };


our $GRID    ||= '100:100';                ## X:Y of grid
our @GRID     = split ':', $GRID;
our $REPN    ||= 10;                        ## Number of representativ
+es
our $LOCN    ||= $REPN * 3;                ## Number of locations
our $EVO    ||= 1000;                    ## Evolution backtrack count
our $S and srand( 1 );                    ## allows comparison between
+ runs.

print "Reps: $REPN Locations:$LOCN";
die "LOCN must be >= $REPN" unless $LOCN >= $REPN;

sub show { ## Format sets for display
    system 'cls' if @_ > 1;
    for( @{ $_[ 0 ] } ) {
        printf "[%7s] %7g [ %s ]\n", 
            "@{ $_->[REP] }", 
            $_->[SCORE]||0.0,
            join '', map{ sprintf "[%3d:%3d]", @$_ } @{ $_->[LOCNS] };
    }
}


sub pythagoras { ## calc distance between to points.
    my( $v1, $v2 ) = @_;
    my $dx = abs( $v1->[X] ) - abs( $v2->[X] );
    my $dy = abs( $v1->[Y] ) - abs( $v2->[Y] );
    return sqrt( $dx**2 + $dy**2 );
}

## Simple scoring. of individual sets
## Sum of distances of locations from rep location.
sub score {
    my( $set ) = @_;
    return sum map {
        pythagoras( $set->[ 0 ], $_ );
    } @{ $set->[ 1 ] }
}


my @reps = map { [ int rand rand $GRID[X], int rand $GRID[Y] ] } 1 .. 
+$REPN;
my @locations = map { [ int rand $GRID[X], int rand $GRID[Y] ] } 1 .. 
+$LOCN;
my @sets = map { [ $_, [ pop @locations ] ] } @reps;
push @{ $sets[ rand @sets ][1] }, pop @locations while @locations;

show( \@sets, 1 ); <STDIN>;

my( $tries, $c ) = ( 0, 's' );
my $best = [ 9e99, [], 9 ];
my $evolution = 0;
my( %scores, %best );

my( $delay, $display, $stop ) = ( -1, 1, 0 );
ReadMode 2;

while( 1 ) {
    ## Caclulate the total score for the current sets.
    ## Sum of individual totals.
    my $totalScore = sum  map { 
        $_->[SCORE] = score $_
    } @sets;
    
    ## Records frequencies of (integerised) solutions found
    $scores{ int $totalScore }++;
    
    ## Keep track of iterations
    $tries++;

    ## Commands to monitor progress and quit.
    $c = ReadKey( $delay )||'';
    $stop = 1            if $c eq 'q'; ## Quit
    $delay += 1          if $c eq 's'; ## speed (0=pause) (n>0 sleep n
+)
    $delay = -1          if $c eq 'c'; ## Continue fullspeed
    $display = !$display if $c eq 'd'; ## Toggle display

    if( $best->[ 0 ] > $totalScore ) { ## If we found a better solutio
+n
        $best = [ $totalScore, clone( \@sets ), $tries ]; ## save it
        $evolution = $EVO;    ## but allow bad solution to evolve for 
+a while 
        ## Keep a record of when we found improvements
        ## to allow estimates of "good enough" iterations;
        $best{ $tries } = $totalScore;
    }
    elsif( $stop or not --$evolution ) { 
      ## if no better evolution after $EVO attempts
        @sets = @{ $best->[1] }; ## Restore the best yet and try again
        print 'Best restored'; Win32::Sleep 1000;        
    }

    last if $stop; ## stop here after ensuring the best is restored.

    ## Sort them by individual scores
    @sets = sort{ $a->[SCORE] <=> $b->[SCORE] } @sets;
    
    ## Display them
    show \@sets, 1 if $display;

    
    printf "%06d (%06d) %g %g\n", 
          $tries, $best->[ 2 ], $best->[ 0 ], $totalScore;
    
    ## If the worst set has more than 1, given one to the best
    push @{ $sets[ 0 ][LOCNS] }, shift @{ $sets[ -1 ][LOCNS] } 
        if @{ $sets[ -1 ][LOCNS] } > 1;
    
    ## pick two set/location pairs at random
    my( $a, $b ) = map{ int rand @sets } 1 .. 2;
    my( $sa, $sb ) = ( int rand $#{ $sets[ $a ][LOCNS] }, int rand $#{
+ $sets[ $b ][LOCNS] } );
    
    ## and swap them
    my $temp = $sets[ $a ][LOCNS][ $sa ]||die "A:$a:$sa";
    $sets[ $a ][LOCNS][ $sa ] = $sets[ $b ][LOCNS][ $sb ]||die "B:$b:$
+sb";
    $sets[ $b ][LOCNS][ $sb ] = $temp;
}

show \@sets; ## display best solution
printf "Total: %d after %d tries\n", $best->[ 0 ], $best->[ 2 ];

printf 'Enter to see a frequendy plot of the solutions found'; <STDIN>
+;
print "$_ => $scores{ $_ }" for sort{ $a <=> $b } keys %scores;

printf 'Enter to see record of best discovery points.'; <STDIN>;
print "Best score: $best{ $_ } after $_ iterations." for sort{ $a <=> 
+$b } keys %best;
[download]

Sample output

[   0 19] 41.3038 [ [  5: 60] ]
[   0 36] 45.0111 [ [ 45: 35] ]
[  47 47] 47.1699 [ [ 87: 72] ]
[  34 60] 49.7695 [ [ 53: 14] ]
[   0 53] 54.9181 [ [ 46: 23] ]
[    5 0] 66.9104 [ [ 16: 66] ]
[  12 85]  67.424 [ [ 51: 30] ]
[   2 98] 78.1025 [ [ 78: 80] ]
[  31 82] 82.8794 [ [ 86: 20] ]
[  36 30] 85.5862 [ [ 95: 92] ]
037234 (021726) 435.401 619.075
Best restored
[   0 53] 54.9181 [ [ 46: 23] ]
[  36 30]      15 [ [ 51: 30] ]
[  12 85] 83.2947 [ [ 95: 92] ]
[   0 36] 45.0111 [ [ 45: 35] ]
[  47 47] 47.1699 [ [ 87: 72] ]
[   0 19] 41.3038 [ [  5: 60] ]
[  34 60] 49.7695 [ [ 53: 14] ]
[  31 82] 82.8794 [ [ 86: 20] ]
[   2 98] 78.1025 [ [ 78: 80] ]
[    5 0] 66.9104 [ [ 16: 66] ]
Total: 435 after 21726 tries
Enter to see a frequendy plot of the solutions found
435 => 3
441 => 3
442 => 2
443 => 1
447 => 1
448 => 1
450 => 3
453 => 4
454 => 3
455 => 1
456 => 1
457 => 3
459 => 3
461 => 2
462 => 1
463 => 3
464 => 4
465 => 3
466 => 4
467 => 9
468 => 6
469 => 8
470 => 6
471 => 6
472 => 7
473 => 12
474 => 4
475 => 8
476 => 12
477 => 10
478 => 10
479 => 15
480 => 13
481 => 12
482 => 13
483 => 9
484 => 20
485 => 8
486 => 17
487 => 22
488 => 21
489 => 22
490 => 21
491 => 29
492 => 30
493 => 23
494 => 27
495 => 40
496 => 34
497 => 39
498 => 34
499 => 42
500 => 31
501 => 43
502 => 64
503 => 49
504 => 53
505 => 44
506 => 55
507 => 72
508 => 63
509 => 72
510 => 58
511 => 59
512 => 60
513 => 84
514 => 63
515 => 59
516 => 78
517 => 83
518 => 77
519 => 105
520 => 94
521 => 83
522 => 91
523 => 110
524 => 107
525 => 101
526 => 134
527 => 125
528 => 119
529 => 110
530 => 134
531 => 122
532 => 113
533 => 125
534 => 152
535 => 157
536 => 170
537 => 153
538 => 173
539 => 153
540 => 194
541 => 181
542 => 149
543 => 179
544 => 188
545 => 172
546 => 192
547 => 210
548 => 207
549 => 213
550 => 204
551 => 212
552 => 235
553 => 254
554 => 248
555 => 185
556 => 268
557 => 242
558 => 256
559 => 247
560 => 258
561 => 258
562 => 241
563 => 290
564 => 281
565 => 288
566 => 318
567 => 306
568 => 268
569 => 306
570 => 275
571 => 281
572 => 268
573 => 307
574 => 307
575 => 299
576 => 254
577 => 293
578 => 324
579 => 284
580 => 299
581 => 304
582 => 303
583 => 322
584 => 333
585 => 342
586 => 323
587 => 329
588 => 333
589 => 312
590 => 331
591 => 274
592 => 314
593 => 349
594 => 318
595 => 303
596 => 290
597 => 318
598 => 323
599 => 288
600 => 272
601 => 329
602 => 331
603 => 335
604 => 304
605 => 312
606 => 328
607 => 311
608 => 315
609 => 290
610 => 280
611 => 320
612 => 272
613 => 267
614 => 298
615 => 307
616 => 260
617 => 280
618 => 283
619 => 346
620 => 299
621 => 253
622 => 262
623 => 264
624 => 251
625 => 230
626 => 226
627 => 254
628 => 239
629 => 260
630 => 231
631 => 245
632 => 226
633 => 217
634 => 244
635 => 206
636 => 221
637 => 226
638 => 180
639 => 176
640 => 206
641 => 175
642 => 180
643 => 161
644 => 166
645 => 203
646 => 167
647 => 179
648 => 163
649 => 155
650 => 125
651 => 146
652 => 128
653 => 130
654 => 121
655 => 137
656 => 127
657 => 121
658 => 122
659 => 120
660 => 100
661 => 115
662 => 93
663 => 119
664 => 102
665 => 109
666 => 107
667 => 90
668 => 87
669 => 88
670 => 90
671 => 63
672 => 69
673 => 74
674 => 69
675 => 45
676 => 57
677 => 61
678 => 53
679 => 49
680 => 62
681 => 54
682 => 53
683 => 41
684 => 50
685 => 49
686 => 50
687 => 41
688 => 31
689 => 44
690 => 30
691 => 37
692 => 36
693 => 36
694 => 42
695 => 39
696 => 29
697 => 19
698 => 26
699 => 28
700 => 28
701 => 30
702 => 24
703 => 22
704 => 22
705 => 16
706 => 20
707 => 15
708 => 13
709 => 13
710 => 13
711 => 14
712 => 9
713 => 8
714 => 3
715 => 8
716 => 4
717 => 1
718 => 2
719 => 9
720 => 3
721 => 5
722 => 3
723 => 3
724 => 9
725 => 3
727 => 4
728 => 1
729 => 2
730 => 3
732 => 4
733 => 4
734 => 2
735 => 1
736 => 1
739 => 2
740 => 2
741 => 1
742 => 3
743 => 1
744 => 1
746 => 1
Enter to see record of best discovery points.
Best score: 587.754090215359 after 1 iterations.
Best score: 585.739406140003 after 2 iterations.
Best score: 574.349845629305 after 7 iterations.
Best score: 539.378581196848 after 9 iterations.
Best score: 515.579334272599 after 25 iterations.
Best score: 503.180579975418 after 26 iterations.
Best score: 482.804864664163 after 98 iterations.
Best score: 481.914466880709 after 99 iterations.
Best score: 469.204365706502 after 623 iterations.
Best score: 468.414887089937 after 1607 iterations.
Best score: 454.855761343826 after 1846 iterations.
Best score: 442.160363943557 after 2281 iterations.
Best score: 442.160363943557 after 2282 iterations.
Best score: 441.218607935927 after 2283 iterations.
Best score: 435.401484875951 after 21726 iterations.
[download]

Examine what is said, not who speaks.

Silence betokens consent.

Love the truth but pardon error.

Comment on Evolution in Action. Select or Download Code

Replies are listed 'Best First'.
Re: Evolution in Action. by kvale (Monsignor) on Feb 26, 2005 at 16:27 UTC
Cool program! The evolution of the score you see is pretty typical of genetic algorithms and genetic programs: much progress is made in the initial iterations, but it takes a lot more to gain further improvement. I just wanted to mention that the assertion by ff's professor that you only need 30 tries is patently false. If a problem has many degrees of freedon and a complex error surface, in general the amount of searching one must do is O(k^N), where N is the number of degrees of freedom and k is related to the complexity of the error surface. To see this, consider a simple example. Suppose we one have degree of freedom and we know that a good solution lies either at x = -1 or x = 1. Then we only have to search twice. Now add a similar degree of freedon y that interacts with x in a complex way. Then we would need to search (+-1, +-1), or 4 attempts. Adding another would need 8 attempts, and so on, yielding O(2^N) behavior. An this is just looking for a good quadrant! This blowup of search space with degrees of freedom is called the 'curse of dimensionality' and along with a complex error surface, is the basic reason that some problems are NP-complete. Some problems have some nice large scale structure of the error surface that admit heurisitic guesses that often do well, but others, like k-satisfiability for k > 2, don't seem to. So don't throw away your simulated annealing and genetic algorithms code yet! -Mark	[reply]

Replies are listed 'Best First'.

Re: Evolution in Action.
by kvale (Monsignor) on Feb 26, 2005 at 16:27 UTC

The evolution of the score you see is pretty typical of genetic algorithms and genetic programs: much progress is made in the initial iterations, but it takes a lot more to gain further improvement.

I just wanted to mention that the assertion by ff's professor that you only need 30 tries is patently false. If a problem has many degrees of freedon and a complex error surface, in general the amount of searching one must do is O(k^N), where N is the number of degrees of freedom and k is related to the complexity of the error surface.

To see this, consider a simple example. Suppose we one have degree of freedom and we know that a good solution lies either at x = -1 or x = 1. Then we only have to search twice. Now add a similar degree of freedon y that interacts with x in a complex way. Then we would need to search (+-1, +-1), or 4 attempts. Adding another would need 8 attempts, and so on, yielding O(2^N) behavior. An this is just looking for a good quadrant! This blowup of search space with degrees of freedom is called the 'curse of dimensionality' and along with a complex error surface, is the basic reason that some problems are NP-complete.

Some problems have some nice large scale structure of the error surface that admit heurisitic guesses that often do well, but others, like k-satisfiability for k > 2, don't seem to. So don't throw away your simulated annealing and genetic algorithms code yet!

-Mark

[reply]

Back to Cool Uses for Perl