Closers and Non-Save Situations

We’ve all seen it happen, right?  Your team trails or leads by five or six runs, and in an attempt to rest the everyday heroes of the bullpen, the closer is called into action.  He then gives up a few runs, all meaningless in the true context of the game, but try telling that to his overall stat line.  You then stop to wonder how this could happen.  I mean, this was the same guy who, just two nights earlier, breezed through the 3-4-5 triumvirate with nothing more than a one-run cushion.  After putting two and two together the thought begins to develop that perhaps closers struggle in their non-save situations.
It’s happened to me plenty of times, but after reading Geoff Young’s article on this subject, dealing with Trevor Hoffman, I decided to take action.  For those unwilling to navigate away from this page, Geoff examined whether or not Hoffman performed worse in non-save situations.  In general, concrete conclusions cannot be drawn from analyzing one player, but the article gave me the idea of testing this hypothesis with a much larger sample.
The first step involved simply figuring out what we are measuring.  Are we looking at ERA?  WHIP?  So many stats, so little time.  What piqued my interest were the potential discrepancies in usage (IP/G), ERA, OPS against, BB/9, and K/9.  Next I needed to assemble the sample.  To really test this hypothesis our sample needs to be very large and very unique.  For instance, a sample of 35 seasons, 15 for Hoffman, 12 for Rivera, and 8 for Wagner would not be enough because it offers just three unique pitchers.  We cannot draw conclusions for a whole population based on three people.  To alleviate this concern, I took every instance of 15+ saves from 1980-2007 and recorded the pertinent numbers.
This query produced 696 seasons and 220 unique closers, which should be a large enough sample.  The statistics were entered based on splits in save situations vs. non-save situations.  To analyze the numbers I am once again calling upon the T-Test.  I mentioned T-Tests in a statistics primer last week, but it essentially compares the means (averages) of two different groups to determine if they are statistically different from one another.  Just because Group A has a 2.33 ERA and Group B has a 2.61 ERA does not automatically mean that A’s ERA is lower than B’s.  The sample may be too small, for instance, and so the ERAs may be different but they are not statistically different.
After running the T-Tests for the five recorded statistics, weighted by innings pitched, all five possessed small enough significance values; this means that the differential in means amongst the save situations and non-save situations splits are, in fact, statistically different.  Below are the means of the two groups:

• IP/G (SS): 1.21
• IP/G (NS): 1.24
• ERA (SS): 2.91
• ERA (NS): 3.15
• OPS (SS): .629
• OPS (NS): .652
• BB/9 (SS): 3.08
• BB/9 (NS): 3.39
• K/9 (SS): 8.12
• K/9 (NS): 7.79

Since all of these means are statistically different from one another it appears that, yes, closers do perform worse in non-save situations.  Their ERA is almost a quarter-point higher, their OPS against is almost twenty-five points higher.  Additionally, their walks have increased along with a decrease in strikeouts.  This isn’t to say that a closer posting a 3.15 ERA with 7.79 K/9 and a .652 OPS against in non-save situations is bad but rather that the numbers represent downgrades when compared to save situation statistics.
A likely reason for this is the usage pattern of closers relative to these situations.  A Closer entering into a non-save situation generally signifies a lack of recent work.  A rust factor may be prevalent.  This is just the first in a series of articles in which I look at closers, because sometimes what is said may not be what is meant.  Perhaps the idea of closers performing worse in non-save situations does not literally mean that; it could be that fans consider closers to perform worse in low leverage situations than high leverage.  This would not always show up in a save vs. non-save investigation.
For now, though, in using 696 seasons and 220 unique closers from 1980-2007, they do in fact perform worse in non-save situations.  Not much worse to the point that they should not be used, but worse.

11 Responses to Closers and Non-Save Situations

1. […] Seidman at Statistically Speaking has a nice study up, in which he examines 220 unique closers over 696 seasons. Guess what? There is a significant […]

2. Andy says:

One thought: Worse “closers” perhaps are more likely to pitch in NS situations because, for example, they may be the “backup” and only became closer because of an injury to the closer. One idea is to require more like 30 saves as a cutoff. This will greatly reduce your sample size, but at least you won’t have that potential problem.

3. Pizza Cutter says:

Eric, if you want to get super technical, Keith Woolner has a model in which you can estimate how often a team with an offense that scores X runs per nine innings could be expected to score 1 run in a given inning or 2 runs or 3 runs, etc. That’s a possible control. If you want, I can send you the info.

4. Alex says:

There’s an element of selection bias here. Closers in save situations are always on the team that’s winning. If you’re winning, there’s a better chance that part of the reason for that is that the other team has a bad offense. Therefore you’re going to put up better stats. Without adjusting for strength of opponent, I’m not sure these results tell us anything.

5. […] book, check it out if you haven’t! — has an article on the age old question of: “Do closers perform worse in non-save situations?“. Hat tip to Delorean for the […]

6. Andy, increasing the minimum would fall moreso into what Alex mentioned, in that, it would qualify moreso for teams that constantly win. Alex, I agree that generally that would be a selection bias, but using a minimum of 15 saves spanning 28 seasons and 696 player-seasons in the sample would not require a strength of opponent; there are plenty of players on crappy teams in here.
The idea that if you’re winning, the other team has a bad offense is adding a potential bias that may or may not exist. Plenty of teams with quality offenses lose. You don’t need strength of opponent here.

7. Additionally, as I mentioned at the end… this is just one test. The actual bias would be that save and non-save situations DO NOT specify the toughness of the actual appearance. Fans get so caught up in expressing this opinion through “closers perform worse in non-save situations” that they don’t realize what they are likely expressing is the sentiment that closers perform worse when the “game isn’t on the line.”
In that regard, we would need to look at splits in high or low leverage situations, and run another paired samples t-test, just as we did here.

8. Moe says:

I don’t think a couple of paired t-tests will do for reasons already mentioned above:
1) The average team your team is loosing against is going to be a better team than the average team your team is winning against. This effect will actually be stronger the larger your sample because the “quality offenses” losing is a low probability event and the larger the sample size the closer the sample average quality will be to the true average quality of winning and losing teams. Hence you need to control somehow for the strength of the opponents offense (e.g. avg number of runs scored)
2) Closer being used in non-save situations could be different from those only used in save situations. Your 15 save cut-off somewhat helps, but not completely. It only eliminates back-up closers and the like, but there are other problems: If you are a top closer you are more likely to play on a good team (think M. Rivera, F. Rodriguez). On a good team, you will get more save opportunities than on a bad team. (If your team loses 60% of all games as opposed to winning 60% the number of games potentially offering a save situation are greatly diminished). Hence, top-closers might have less non-save situations than bad ones and the composition is all that drives your findings.
If you want to seriously address these issues, you should consider the following:
Regress the statistic you are interested in (e.g. ERA, OPS..) on a constant, a dummy for non-safe [This gives you your above results]. Now include player-year fixed effects (to control for the quality of the player and the team he plays on in any given year) and some measure of the quality of opponent (I would like avg OPS of batters faced, but for that you need game level data). If now the coefficient on the non-save dummy is significant, I’m more inclined to buy your story.

9. tangotiger says:

Pizza: the Tango Distribution (for lack of a better name) will give you that, and in fact is the distribution that Woolner has adopted over his. You can get it as the last two links on my home page.

10. Met-rician says:

A far more obvious difference between SS and NS is that half the time when a closer blows a save (in the bottom of the ninth inning) the game automatically ends, cutting short the potential runs that could have ended up being scored off of or charged to that pitcher. In other words, many save situations have a cap to the number of runs that the closer can give up, which is generally not the case in non-save situations.

11. Aaron says:

How could I compare Hoffman to Rivera only in save situations? How did you get this data?

Thanks!