Student Class Section Lottery

A while back I posted a node looking for good algorithm to use when assigning students to class sections. BrowserUk had a lot of good suggetions that got me thinking about all sorts of strategries, but in the end, some comments kvale made got me thinking about just trying a lottery based approach with lots of iterations to get the best score. That first pass then evolved.

What I finally came up with, is (I think) a really fair approach, that if allowed to run long enough (or on a small enough data set) winds up being a brute force algorithm to try giving the students their first pick in every possible permutation of students.

BUT: since trying *EVERY* permutation of all students might take a while in large datasets, it works it's way up to that by picking combinations of C "starter students" who get to go first (all permutations of those N are tried) followed by R random orderings of the remaining students. C is initially 1, and is incrimented once R orderings starting with every permutation of C have been tried. Once it's generated a solution with a score you are happy with, you can kill it. (using -r 1 and -c numberOfStudents it only tries the brute force approach of every permutation of students)

This kind of approach is similar to an Iteratively Deepening Depth First Search of a graph.

The Code...

#!/usr/local/bin/perl

# $Source: /home/hossman/cvs_archive/code/perl/section-lottery.pl,v $
# $Id: section-lottery.pl,v 1.7 2005/01/12 22:35:43 hossman Exp $
#
# see "--help" for usage info
#
# Given a section CSV file and a prefrences CSV file (in that order)
# This tries a bunch of random orderings of students and gives each
# one the best choice it can.
#
# As it goes, it keeps track of solutions it's already found and won't
# generate a solution if:
#   1) this solution excludes more students then a previous one
#   2) this solution excludes an identicle list of students
#
# The output files have names indicating how many people have been lef
+t out,
# followed by a random number, followed by a timestamp indicating when
+ it
# generated.
#
# The contents of the CSV files contain 2 sections.  The first lists
# all of the sections and how many "free spots" are still left.  the s
+econd
# lists each student, their choice they recieved (ie: 1st, 2nd), the s
+ection
# they've been assigned to, and their orriginal prefrences in order.
# their section may be empty if they are unassigned, in which case the
+ir
# choice number will be really really big (to make sorting in excel ea
+sier)
#
# In an attempt to ensure "fairness" this script generates combination
+s of
# students of increasing sizes to be the "start" students who get "fir
+st
# pick".  For each combination of starting students, all permutations
# of the starters are considered, and then some number of random
# permutations of the remaining students are tried.  The initial start
# set combination size, and the number of random orderings to do using
# each starting permutation are configurable on the command line

use warnings;
use strict;

use Digest::MD5 qw(md5_hex);
use List::Util qw(shuffle);
use Math::Combinatorics;
use Text::CSV_XS;
use Data::Dumper;
use Getopt::Std;

use constant {
    SECTIONS => 'sections',
    PREFS => 'prefs',
    ASSIGNED => 'assigned',
};

$Getopt::Std::STANDARD_HELP_VERSION = 1;
sub HELP_MESSAGE {
    my $fh = shift;
    print $fh
        "Usage: $0 [-m] [-i num] [-r num] [-c num] -s sec.csv -p prefs
+.csv\n",
        "   -m            treat the csv files as mac files (optional)\
+n",
        "   -i num        max num of iterations to try (optional)\n",
        "                 (default to infinite)\n",
        "   -r num        num of random attempts per combination (opti
+onal)\n",
        "                 (defaults to sqrt number of students)\n",
        "   -c num        combination size to start at (optional)\n",
        "                 (defaults to 1)\n",
        "   -s sec.csv    section availability file\n",
        "   -p prefs.csv  prefrences file\n";
}


{   # block scope because of stupid "globals"


    my %opts;

    (getopts('ms:p:i:r:c:', \%opts) and exists $opts{s} and exists $op
+ts{p})
        or (HELP_MESSAGE(\*STDOUT) and exit());

    my $state = State::parse_files($opts{s}, $opts{p},
                                   (exists $opts{m} ? "\015" : "\n"));


    # NOTE: for each number of unassigned, we are writting out solutio
+ns
    # even if they don't have as good a score as previous solutions.
    # this is so the user has the option of picking a higher scoring
    # solution if they like the distrobution better.

    my @students = sort $state->unassigned();
    my $students = scalar(@students);
    my $r = exists $opts{r} ? $opts{r} : int(1 + sqrt $students);
    my $min = $students + 1;

    # negative indicates run forever
    my $count = exists $opts{i} ? $opts{i} : -1;
    my %known = ();

    print "Number of sections: " . keys(%{$state->{SECTIONS}}) . "\n";
    print "Number of students: $students\n";
    if (0 < $count) {
        print "Doing $count iterations...\n";
    } else {
        print "Looping over all permutations (or untill a perfect one)
+...\n";
    }

    for (my $c = exists $opts{c} ? $opts{c} : 1; $c <= $students; $c++
+) {
        # gradually incriment c, finding the list of all
        # combinations of that size as we go.

        # for each permutation of each combination, generate
        # some number of random orderings that start with that
        # combination/permutation

        # write out any solutions we find that are better then
        # (or as good as) the ones before in terms of unassigned
        #
        # stop if we get a perfect one

        print STDERR "Trying combinations of size $c\n";

        my $comb = Math::Combinatorics->new(count=> $c, data => \@stud
+ents);

        while (my @start = $comb->next_combination()) {

            my $perm = Math::Combinatorics->new(data => \@start);

            while (my @p = $perm->next_permutation()) {

                my $pclone = $state->clone();

                $pclone->in_order(@p);

                foreach (1..$r) {

                    my $clone = $pclone->clone();
                    $clone->in_order(shuffle($clone->unassigned()));

                    my @leftovers = $clone->unassigned();
                    my $cur = scalar @leftovers;
                    my $score = $clone->score();

                    my $md5 = md5_hex(join ' ', $score, sort @leftover
+s);

                    if ($cur <= $min and ! exists $known{$md5}) {
                        print "Found new solution with $cur leftout: $
+score\n";
                        $min = $cur;
                        $known{$md5}++;
                        $clone->write_new_file();
                    }

                    die "found a perfect solution\n"
                        if 0 == $min and 0 == $score;

                    die "exceeded max iterations\n" if 0 == $count--;

                } # foreach random ordering

            } # foreach permutation of our combination

        } # foreach combination of size $c

    } # foreach $c

}

######################################################################
+##

package State;

# class method
sub new {
    # basic empty constructor
    my $self = bless { SECTIONS => {}, PREFS => {}, ASSIGNED => {} }, 
+shift;
    return $self;
}

# class method
sub parse_files {
    my ($sections, $prefs, $sep) = @_;
    # sections is the sections and the number of spots
    # prefs is the users and their prefrences in order
    # sep is the input file record seperator
    #
    # returns a State object

    local $/ = $sep;

    my $s = State->new();

    my ($sec_fh, $pref_fh);
    open $sec_fh, $sections or die "can't open sections file";
    open $pref_fh, $prefs or die "can't open prefs file";

    my $csv = Text::CSV_XS->new();

    while (<$sec_fh>) {
        chomp;
        $csv->parse($_)
            or die "can't parse sec line: " . $csv->error_input() ." "
+;
        my @cols = $csv->fields();
        $s->{SECTIONS}->{$cols[0]} = $cols[1];
    }

    while (<$pref_fh>) {
        chomp;
        $csv->parse($_)
            or die "can't parse prefs line: " . $csv->error_input() ."
+ ";
        my @cols = $csv->fields();
        my $stu = shift @cols;
        $s->{PREFS}->{$stu} = [ grep { '' ne $_} @cols ];
    }

    return $s;
}

# instance method
sub assign {
    my ($self, $student, $section) = @_;
    # assign student to section, returns nothing

    die "$section isn't a section" unless exists $self->{SECTIONS}->{$
+section};
    die "$student isn't a student" unless exists $self->{PREFS}->{$stu
+dent};

    die "$section is full" unless 0 < $self->{SECTIONS}->{$section};
    warn "$student didn't pick $section"
        unless grep { $section eq $_ } @{$self->{PREFS}->{$student}};
    warn "$student was already in another section"
        if exists $self->{ASSIGNED}->{$student};

    $self->{SECTIONS}->{$section}--;
    $self->{ASSIGNED}->{$student} = $section;

}

# instance method
sub best_avail {
    my ($self, $student) = @_;
    # given a student, return the best available section in their pref
+s
    # returns undef if nothing is available

    my @prefs = @{$self->{PREFS}->{$student}};
    foreach (@prefs) {
        if (0 < $self->{SECTIONS}->{$_}) {
            return $_;
        }
    }

    return undef;
}

# instance method
sub in_order {
    my ($state, @list) = @_;
    # given an ordered list of people
    # give them their best available choice.
    # if @list includes a person who is allready in a section, happily
+ ignore

    foreach (@list) {
        next if exists $state->{ASSIGNED}->{$_};
        my $best = $state->best_avail($_);
        $state->assign($_, $best) if defined $best;
    }
}

# instance method
sub unassigned {
    my $self = shift;
    # returns a list of all the people not yet in a section


    my @leftovers = ();
    foreach (keys %{$self->{PREFS}}) {
        push @leftovers, $_ unless exists $self->{ASSIGNED}->{$_};
    }


    return @leftovers;
}

# instance method
sub score {
    my $self = shift;
    # returns a numeric score for the current state
    #
    # when comparing two states with the same number of people unassig
+ned,
    # the smaller the score, the better.  (scores can't be compared be
+tween
    # states with different numbers of unassigned students)

    my $score = 0;
    foreach my $student (keys %{$self->{ASSIGNED}}) {
        my $sec = $self->{ASSIGNED}->{$student};
        my @prefs = @{$self->{PREFS}->{$student}};

        my ($rank) = grep $prefs[$_] eq $sec, 0..$#prefs;

        $score += $rank;
    }
    return $score;

}

# instance method
sub write {
    my ($self, $fh) = @_;
    # writes out the current state as a CSV file to $fh

    my $csv = Text::CSV_XS->new({ eol => "\n", always_quote => 1 });

    # print all of hte sections and how many they have left
    foreach (sort keys %{$self->{SECTIONS}}) {
        $csv->combine($_, $self->{SECTIONS}->{$_})
            or die "can't combine secs line: " . $csv->error_input() .
+" ";
        print $fh $csv->string();
    }

    # print a "blank line" in CSV speak
    print $fh qq{""\n};

    foreach (keys %{$self->{PREFS}}) {

        my ($assign, $rank) = ('',99999);
        my @prefs =  @{$self->{PREFS}->{$_}};

        if (exists $self->{ASSIGNED}->{$_}) {
            $assign = $self->{ASSIGNED}->{$_};
            ($rank) = grep $prefs[$_] eq $assign, 0..$#prefs;
            $rank++;
        }

        $csv->combine($_, $rank, $assign, @prefs)
            or die "can't combine assigned line: " . $csv->error_input
+() ." ";
        print $fh $csv->string();
    }

}

# instance method
sub write_new_file {
    my $self = shift;
    # writes to a new file with a name based on the number of
    # unassigned people, the score, a random number, and a timestamp.

    my $name = ($self->unassigned()) . "-" . ($self->score()) . '-' .
        int(rand(10000)) . '-' . time() . ".csv";
    open my $fh, ">$name";
    $self->write($fh);
    close $fh;
}

# instance method
sub clone {
    my $self = shift;
    # returns a copy of itself with independant sections/assigned hash
+es
    # (PREFS is shared since it shouldn't be changable)

    my $that = State->new();

    foreach (keys %{$self->{SECTIONS}}) {
        $that->{SECTIONS}->{$_} = $self->{SECTIONS}->{$_};
    }
    foreach (keys %{$self->{ASSIGNED}}) {
        $that->{ASSIGNED}->{$_} = $self->{ASSIGNED}->{$_};
    }

    foreach (keys %{$self->{PREFS}}) {
        $that->{PREFS}->{$_} = [ @{$self->{PREFS}->{$_} } ];
    }

    #$that->{PREFS} = $self->{PREFS};


    return $that;
}

__END__
[download]

And an example run...

radix:~/tmp/section-lottery> cat data/sec.1.csv
sec1,1
sec2,45
sec3,9
sec4,0
sec5,0
"section six",3
section999,10
radix:~/tmp/section-lottery> cat data/prefs.1.csv
1234,sec1,sec3,sec5
2345,"section six",sec2
5678,sec5,sec4,sec3,sec1,sec2,section999
4,sec1
5,sec1,"section six"
6,sec1
radix:~/tmp/section-lottery> ~/cvs/working/code/perl/section-lottery.p
+l -s data/sec.1.csv -p data/prefs.1.csv
Number of sections: 7
Number of students: 6
Looping over all permutations (or untill a perfect one)...
Trying combinations of size 1
Found new solution with 2 leftout: 3
Found new solution with 1 leftout: 4
Found new solution with 1 leftout: 4
Trying combinations of size 2
Trying combinations of size 3
Trying combinations of size 4
Trying combinations of size 5
Trying combinations of size 6
radix:~/tmp/section-lottery> ls *.csv
1-4-5201-1105570785.csv  1-4-8697-1105570785.csv  2-3-7313-1105570785.
+csv
radix:~/tmp/section-lottery> cat 1-4-5201-1105570785.csv
"sec1","0"
"sec2","45"
"sec3","7"
"sec4","0"
"sec5","0"
"section six","1"
"section999","10"
""
"6","99999","","sec1"
"2345","1","section six","section six","sec2"
"4","1","sec1","sec1"
"5678","3","sec3","sec5","sec4","sec3","sec1","sec2","section999"
"1234","2","sec3","sec1","sec3","sec5"
"5","2","section six","sec1","section six"
[download]

Back to Cool Uses for Perl