When Should Managers Pull Their Starting Pitchers?

"A Data-driven Method for In-game Decision Making in MLB" is easily the worst title of the eight Sloan paper finalists, which is a shame, given that it's actually an interesting application of machine learning that's directly related to one of the most important aspects of baseball game management. By "in-game decision making," authors Gartheeban Ganeshapillai and John Guttag are referring to one decision: When do you pull your starting pitcher? More specifically, when do you pull your starting pitcher, if it's late in the game and the score is close?

Using data from the first 80 percent of the 2006-2010 seasons, the researchers built a model that estimated whether the starting pitcher would give up at least one run in the next inning of late (5th-inning+), close games, and used this to make a recommendation as to whether to pull the starter mid-frame. This model was accurate in 81 percent of innings. The "researcher" model was then tested on the last 20 percent of the 2006-2010 seasons—21,538 innings in total—and it disagreed with the actual decisions made by MLB managers a remarkable 48 percent of the time.

In 6,201 innings, the model recommended keeping the starter in, and the starter was kept it. The pitcher gave up a run the next inning 18 percent of the time. In 9,288 innings, the manager opted to keep the starter in when the model recommended pulling them, and in these cases the pitcher gave up a run the next inning 32 percent of the time.* In other words, the result implies that managers should be pulling their starters a lot more often in late, close situations.


If you follow NFL statistics, you might be noticing some parallels between Ganeshapillai's and Guttag's research and the work that Brian Burke has been doing for fourth-down calls. NFL coaches have become slightly less risk-averse on fourth down in recent years, just as MLB managers have become more reliant on middle relief. Nevertheless, these two mathematical models—both, to be fair, limited by the amount of contextual information they can employ—imply that coaches/managers are still far from "maximizing" their decision-making in these situations.

One critical difference: Most would agree that more fourth-down attempts would make football more exciting. No one's clamoring for more pitching changes, and the deep-throwing starter is still venerated in baseball (seriously or otherwise). On the other hand, the baseball powers-that-be seem generally more amenable to this type of analysis, and good starting pitching has the nasty habit of being expensive.


By the researchers' own admission, this model needs some more development. Relying solely on the question of whether a run will score in the next inning is somewhat simplistic, the model doesn't account for the quality of relief pitchers, and it can't calculate middle-of-the-inning pitching changes. There are also all sorts of intangible factors in play—the day after a 15-inning game in which he emptied the pen, for example, a manager might put more value on resting his relievers than on preventing that next run from scoring. Still, what we're seeing is a pretty strong result, and keeping with overall trend suggesting you should expect to see a lot more sixth-inning relief before you see teams consistently going for it on fourth and five.

[A Data-driven Method for In-game Decision Making in MLB]

*One issue, touched on in the paper: The reverse situation, in which the manager pulls a starter that the model recommended stay in, occurred in 1,037 innings, and it's impossible to test which decision was better (since you can't know how the starter would have performed). These innings represented just five percent of the total test sample.