The “toughest out” study, redux
September 25, 2007 7 Comments
I never expected the “toughest out” study to be much of anything. I was bored one night, fooling around a little bit with some Retrosheet data files, and thought I might test out the old “What’s the toughest out?” question. Then, Rob Neyer from ESPN linked to me, and it became my all-time most read piece. All this for a study that I did in about 15 minutes. Of course, the stuff that takes me hours of detail-oriented work to do gets read by five people.
When writing the piece, I knew that I wasn’t really doing the study much justice. I didn’t control for batter or pitcher quality and my sampling methods were based on how quick I could get the study done. A few good commenters pointed out a few possible improvements, and then (insert spooky sound effects here), Bill James himself visited me in a dream last night and told me the spirits of Sabermetrics were angry at me for shirking my duty. (OK, not really.) So, here is the “toughest out” study, done properly.
First, the ever reliable Tango Tiger suggested that I look at the overall league OBP for the plate appearances when there had already been 1 out recorded, then 2, and so on. Fair enough. (A small confession: what I calculated wasn’t exactly OBP. Because of my database set up, I had to make do with whether an out had been recorded in each plate appearance. The great majority of those outs were made by the batter, but occasionally a batter singles, but his idiot teammate gets thrown out at third. There are also times when a batter strikes out, but reaches first on a passed ball.) My data set is almost everything that happened in 2006, throwing away my original stipulation that the only interesting things to look at were the games in which all 27 outs had been recorded. (This threw away all ninth inning comebacks by the home team, as well as all home wins in which the bottom of the ninth was superfluous to requirements.) I didn’t look at outs that were recorded on caught stealings or pickoffs.
The out with the highest OBP? The 17th out (which came in 2nd place in the original study) with an OBP of .3614. It was followed closely by the 9th out at .3608, then the 10th out, 2nd out, and 1st out. The easiest out to get was the 25th out (1st out of the ninth inning) with an OBP of .3071. So the difference between the highest and the lowest is .054, which is one non-out in twenty plate appearances. Not a huge difference, but definitely a difference. The 27th out was actually the 6th easiest to come by.
Still, in my original study, the 1st out was the most difficult to come by. Several folks properly pointed out that this had something to do with the fact that the first person up in a game is the leadoff hitter and, Juan Pierre not withstanding, the leadoff guy is usually a high OBP guy. So, it’s important to control for the batter’s ability to avoid outs and the pitcher’s ability to induce them.
I calculated OBP for all pitchers and hitters over the course of the season and converted them into odds ratios. For those who aren’t familiar, an odds ratio takes the probability (p) from a yes/no question (Did the batter make an out or not?) and turns it into something that is much more easy to work with mathematically. The formula is p / (1-p).
Now, suppose that Larry is pitching and he has an OBP against of .333. He is facing Neifi (a name I just pulled out of nowhere) who has an OBP of .200. Neifi’s odds ratio is .200 / (1 – .200), which is (.200 / .800), or 0.25. Larry is at .333 / (1 – .333) or 0.5. What is the expectation that this confrontation will end up without making an out? We can find it with the following formula:
(batter OR / league OR) * (pitcher OR / league OR) = (expected OR / league OR)
I had to calculate the OBP for all at-bats in the 2006 season (.3409 for the curious, which may not match up to other sources, but remember I’m using a slightly definition for the purposes of this study), but the rest is just plugging numbers into the formula and solving. Once we’ve got the expected OR, it’s easy enough to convert it back into a probability. p = OR / (OR + 1).
Given all that, we can figure out what the expected OBP would be for any given plate appearance and by summing a few things up, what the overall expected OBP would be for all PA’s at a specific level of outs. Then, we can compare what the actual OBP was for that number of outs versus what could be expected given batter and pitcher quality.
The toughest out to get using this formula? Still the 17th out. It had an expected OBP of .344 given who was batting and pitching at that time, but had an actual OBP of .361 for a difference of .017. Following close behind it were the 9th, 12th, and 14th outs. The easiest out to get was still the 25th out, followed by the third, and the fifth out. I couldn’t discern any kind of pattern running through the numbers. Maybe I’ll take a look at a few other years to see whether certain outs are tough to come by from year to year.
The first out of the game actually drops down into 16th place. Interestingly enough, the expected OBP for the batter/pitcher matchups that tried to produce the first out was .3556, while the actual OBP was .3551. The first out of the game is almost exactly as hard to come by as one would expect given the people who generally bat (and pitch) there. The 27th out actually had a similar pattern, with an expected OBP of .3279 and an actual OBP of .3282. The last out of the game isn’t any harder (or easier) to get than one might expect given the batter/pitcher matchups that happen there.