in reply to Baseball line up (best rotation)

First of all don't worry about this looking like homework. Homework is closed ended and well specified. Your problem is neither.

Your basic problem is a modelling issue. How detailed do your want your model to be of what goes into baseball, and how much data do you have to defend that model? After you have your model you have a completely separate analysis issue of figuring out the expected performance.

So what goes into a model? Well you can try to model everything, but I would say that you should go for simple. A person goes up to bat. One of several things happen. They get out. A hit advances people x bases (0, 1, 2, 3, homerun) and the top y people get out. Does your data look something like this? Ignore details like, "He runs really well" and assume it does.

Your next step is to fit the model to the people on the team. You have a number of outcomes when foo goes up to bat. Estimate the relative probabilities. The simpler your model, the fewer possibilities, the more data, the more comfortable you will be with your fit. But conversely the simpler, the less that is taken into account, the worse your model.

For the analysis I suggest Monte Carlo. You have your model. You have your numbers. Play Ball! There are only 87178291200 possible line-ups, a computer can crank through that in abou...

Oh shoot. That will take a while.

What you will need to do is take your players and rank them into a few roughly equal groups. Rather than try each lineup you want to try every way of scattering your fixed groups around the lineup. For instance if your groups are the star, 2 more good players, 6 more OK ones, and the 5 who demonstrate why it is little league, then you have about a half-million possible lineups to consider.

So now play ball. Play each of these lineups for 100 innings. (By play a lineup I mean randomly line up the players within the lineup, generate random numbers, and play.) That is about 50 million simulated innings, it will take a while. Drop 2/3 of them. Try that again. Keep on doing that until you get down to a hundred or so grouped lineups. Then take your groupings and split your groups in half. That will get you a lot more lineups again. Wash, rinse, and repeat until you have (by your numbers) the top few lineups.

If your kid brother doesn't move up in the batting numbers, don't tell anyone. If he does, then good luck convincing the coach...

Either way you will learn something about statistics, programming, and exactly how hard it is to come up with a decent model of anything in the real world.

  • Comment on Re (tilly) 1: Baseball line up (best rotation)

Replies are listed 'Best First'.
Re: Re (tilly) 1: Baseball line up (best rotation)
by Masem (Monsignor) on May 17, 2001 at 21:54 UTC
    Setting it up as Tilly suggests, this also might be a good problem for a genetic algorithm. First, create and fill a hundred or so random lineups from all available teams. Each step of the interation will require playing each line up as randomly as tilly suggests for a game, assigning the total number of runs won as the 'value' of that lineup. For each line up, you could run the game multiple times, the value being the total sum of all scores. When all lineups have been done, sort this set based on scores; remove those lineups that did not perform well (say, less than 2*number of games played*number of innings), and for those that did perform well (say, better than 4*number of games played*number of innings), copy and mutate them. The mutation would be one of two things: either randomly switch the order or two consecutive players on the team, or switch a random player with a random player not currently on the lineup. Clear out all the current 'values' and rerun. After several iterations of this, the top 10 or so lineups should have outstanding results, which you can then compare to the current lineup with.

    To make this work better, you should generate a large array of random numbers that would be regenerated on each step, but within each step, each tested lineup would use the same random numbers in the same order, if only to remove a potental bias.

    (Yes, I know this is serious overkill, but it's an interesting thought ...)


    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
      Just to supplement Masem's reply...

      First, you might consider doing a crossover in addition to a mutation to ensure that you properly explore the line-up space. You'll need to do a partially matched crossover (PMX) to ensure that you don't duplicate the team members. You could do a mutation as suggested by Masem either on each lineup or on each member of a lineup 1-5% of the time.

      Secondly you could experiment with various types of selection algorithms for determining who breeds and who dies. Masem suggested the Percentage model, but there's also the Roulette and Tournament models.

      Third, it would be really neat to take your top lineup from one run and stick it into a new run of the GA to see how it fares.

      Check out Genetic Algorithms with Perl for more guidance and inspiration.

      Enjoy!

      Addendum: I just realized the page didn't actually link to the code.