The “toughest out” study, redux

I never expected the “toughest out” study to be much of anything.  I was bored one night, fooling around a little bit with some Retrosheet data files, and thought I might test out the old “What’s the toughest out?” question.  Then, Rob Neyer from ESPN linked to me, and it became my all-time most read piece.  All this for a study that I did in about 15 minutes.  Of course, the stuff that takes me hours of detail-oriented work to do gets read by five people.
When writing the piece, I knew that I wasn’t really doing the study much justice.  I didn’t control for batter or pitcher quality and my sampling methods were based on how quick I could get the study done.  A few good commenters pointed out a few possible improvements, and then (insert spooky sound effects here), Bill James himself visited me in a dream last night and told me the spirits of Sabermetrics were angry at me for shirking my duty.  (OK, not really.)  So, here is the “toughest out” study, done properly.
First, the ever reliable Tango Tiger suggested that I look at the overall league OBP for the plate appearances when there had already been 1 out recorded, then 2, and so on.  Fair enough.  (A small confession: what I calculated wasn’t exactly OBP.  Because of my database set up, I had to make do with whether an out had been recorded in each plate appearance.  The great majority of those outs were made by the batter, but occasionally a batter singles, but his idiot teammate gets thrown out at third.  There are also times when a batter strikes out, but reaches first on a passed ball.)  My data set is almost everything that happened in 2006, throwing away my original stipulation that the only interesting things to look at were the games in which all 27 outs had been recorded.  (This threw away all ninth inning comebacks by the home team, as well as all home wins in which the bottom of the ninth was superfluous to requirements.)  I didn’t look at outs that were recorded on caught stealings or pickoffs.
The out with the highest OBP?  The 17th out (which came in 2nd place in the original study) with an OBP of .3614.  It was followed closely by the 9th out at .3608, then the 10th out, 2nd out, and 1st out.  The easiest out to get was the 25th out (1st out of the ninth inning) with an OBP of .3071.  So the difference between the highest and the lowest is .054, which is one non-out in twenty plate appearances.  Not a huge difference, but definitely a difference.  The 27th out was actually the 6th easiest to come by.
Still, in my original study, the 1st out was the most difficult to come by.  Several folks properly pointed out that this had something to do with the fact that the first person up in a game is the leadoff hitter and, Juan Pierre not withstanding, the leadoff guy is usually a high OBP guy.  So, it’s important to control for the batter’s ability to avoid outs and the pitcher’s ability to induce them.
I calculated OBP for all pitchers and hitters over the course of the season and converted them into odds ratios.  For those who aren’t familiar, an odds ratio takes the probability (p) from a yes/no question (Did the batter make an out or not?) and turns it into something that is much more easy to work with mathematically.  The formula is p / (1-p). 
Now, suppose that Larry is pitching and he has an OBP against of .333.  He is facing Neifi (a name I just pulled out of nowhere) who has an OBP of .200.  Neifi’s odds ratio is .200 / (1 – .200), which is (.200 / .800), or 0.25.  Larry is at .333 / (1 – .333) or 0.5.  What is the expectation that this confrontation will end up without making an out?  We can find it with the following formula:
(batter OR / league OR) * (pitcher OR / league OR) = (expected OR / league OR)
I had to calculate the OBP for all at-bats in the 2006 season (.3409 for the curious, which may not match up to other sources, but remember I’m using a slightly definition for the purposes of this study), but the rest is just plugging numbers into the formula and solving.  Once we’ve got the expected OR, it’s easy enough to convert it back into a probability.  p = OR / (OR + 1).
Given all that, we can figure out what the expected OBP would be for any given plate appearance and by summing a few things up, what the overall expected OBP would be for all PA’s at a specific level of outs.  Then, we can compare what the actual OBP was for that number of outs versus what could be expected given batter and pitcher quality.
The toughest out to get using this formula?  Still the 17th out.  It had an expected OBP of .344 given who was batting and pitching at that time, but had an actual OBP of .361 for a difference of .017.  Following close behind it were the 9th, 12th, and 14th outs.  The easiest out to get was still the 25th out, followed by the third, and the fifth out.  I couldn’t discern any kind of pattern running through the numbers.  Maybe I’ll take a look at a few other years to see whether certain outs are tough to come by from year to year.
The first out of the game actually drops down into 16th place.  Interestingly enough, the expected OBP for the batter/pitcher matchups that tried to produce the first out was .3556, while the actual OBP was .3551.  The first out of the game is almost exactly as hard to come by as one would expect given the people who generally bat (and pitch) there.  The 27th out actually had a similar pattern, with an expected OBP of .3279 and an actual OBP of .3282.  The last out of the game isn’t any harder (or easier) to get than one might expect given the batter/pitcher matchups that happen there.

Advertisements

7 Responses to The “toughest out” study, redux

  1. tangotiger says:

    Cool stuff. Can you list the results in a table: out/actual/expected/diff/SD
    Assuming about 7000 PA per out-slot, the random diff should be .0057. (SD above is diff/.0057). For the 17th out, you are reporting a diff of .017, meaning 3.0 SD from the mean. It’s possible this is evidence of a tiring pitcher (17th out would mean around the 26th batter, which is right around when pitchers are pulled).
    When I looked at it by PA (not out), there was a definite tiring pattern (the OBP goes up, the more batters a pitcher faces). It’s in The Book if you want to reference it.
    Anyway, the 17 point difference is more like a 9 point difference after reflecting the tiring aspect, turning the 3 SD into 1.5 SD.
    You’ll probably find that after you apply a “tiring/starter” effect, that the differences are random, as you’ve suspected.

  2. tangotiger says:

    I should have added that if you take the SD of the (adjusted) SD, that you’ll probably get something very close to 1.00 (i.e., random).

  3. Pizza Cutter says:

    Hopefully this works formatting-wise…
    Out Actual Expected Diff
    17.00 .3614 .3441 .0173
    9.00 .3608 .3484 .0125
    12.00 .3439 .3327 .0111
    14.00 .3450 .3363 .0087
    21.00 .3506 .3425 .0081
    10.00 .3555 .3493 .0063
    18.00 .3461 .3412 .0049
    8.00 .3434 .3391 .0044
    11.00 .3441 .3403 .0038
    13.00 .3357 .3343 .0013
    16.00 .3453 .3440 .0013
    20.00 .3408 .3400 .0008
    27.00 .3282 .3279 .0003
    4.00 .3449 .3449 .0001
    23.00 .3416 .3417 -.0001
    1.00 .3551 .3556 -.0004
    6.00 .3192 .3207 -.0015
    15.00 .3364 .3380 -.0016
    22.00 .3402 .3429 -.0026
    24.00 .3363 .3408 -.0046
    2.00 .3554 .3630 -.0076
    26.00 .3214 .3304 -.0090
    19.00 .3331 .3426 -.0094
    7.00 .3166 .3262 -.0097
    5.00 .3174 .3291 -.0117
    3.00 .3518 .3639 -.0120
    25.00 .3071 .3325 -.0254

  4. gdog says:

    interesting…let’s sort that again
    Out Actual Expected Diff
    1.00 .3551 .3556 -.0004
    2.00 .3554 .3630 -.0076
    3.00 .3518 .3639 -.0120
    4.00 .3449 .3449 .0001
    5.00 .3174 .3291 -.0117
    6.00 .3192 .3207 -.0015
    7.00 .3166 .3262 -.0097
    8.00 .3434 .3391 .0044
    9.00 .3608 .3484 .0125
    10.00 .3555 .3493 .0063
    11.00 .3441 .3403 .0038
    12.00 .3439 .3327 .0111
    13.00 .3357 .3343 .0013
    14.00 .3450 .3363 .0087
    15.00 .3364 .3380 -.0016
    16.00 .3453 .3440 .0013
    17.00 .3614 .3441 .0173
    18.00 .3461 .3412 .0049
    19.00 .3331 .3426 -.0094
    20.00 .3408 .3400 .0008
    21.00 .3506 .3425 .0081
    22.00 .3402 .3429 -.0026
    23.00 .3416 .3417 -.0001
    24.00 .3363 .3408 -.0046
    25.00 .3071 .3325 -.0254
    26.00 .3214 .3304 -.0090
    27.00 .3282 .3279 .0003

  5. tangotiger says:

    If I take the standard deviation of the SD, I get 1.59 which is very significant. If I only do it to the first 18 outs (6 innings, basically the starter), I get an SD of 1.45.
    If I stick with the first 18 outs, and adjust the first 7 outs diff upward by 6 OBP points, and the other 11 outs down by 6 OBP points (as a way to handle the tiring pitcher and/or advantage of batter in facing same pitcher multiple times), the SD is 0.95. That is, random.
    Trying to do the same with the final 9 outs, and it becomes readily apparent that the easiest out, more than can be explained by chance, is the 25th out, as pizza has pointed out. The reason here is likely that it’s the first out of the 9th inning, and you have a “fresh” pitcher.
    Otherwise, all other outs are within the realm of chance.

  6. Steve Schramm says:

    but, but, but, hang on here. if you’ve eliminated all the home team come from behind wins, does that affect the probabilities on outs 25, 26 & 27 in a statistically significant way?? Aren’t there a bunch of ABs for those outs that just got thrown out of the study? Or is it not enough to matter? Seems like I’m always seeing this on SportsCenter, but then again, it’s not much fun seeing the home team go down 1-2-3 9th (unless it’s your team as visitors, of course).

  7. Pizza Cutter says:

    A small mis-understanding. I did an earlier version of the study in which I had eliminated all come-from-behind wins. This post addressed that problem (and others) from the initial study.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: