Closers and Non-Save Situations
August 7, 2008 11 Comments
We’ve all seen it happen, right? Your team trails or leads by five or six runs, and in an attempt to rest the everyday heroes of the bullpen, the closer is called into action. He then gives up a few runs, all meaningless in the true context of the game, but try telling that to his overall stat line. You then stop to wonder how this could happen. I mean, this was the same guy who, just two nights earlier, breezed through the 3-4-5 triumvirate with nothing more than a one-run cushion. After putting two and two together the thought begins to develop that perhaps closers struggle in their non-save situations.
It’s happened to me plenty of times, but after reading Geoff Young’s article on this subject, dealing with Trevor Hoffman, I decided to take action. For those unwilling to navigate away from this page, Geoff examined whether or not Hoffman performed worse in non-save situations. In general, concrete conclusions cannot be drawn from analyzing one player, but the article gave me the idea of testing this hypothesis with a much larger sample.
The first step involved simply figuring out what we are measuring. Are we looking at ERA? WHIP? So many stats, so little time. What piqued my interest were the potential discrepancies in usage (IP/G), ERA, OPS against, BB/9, and K/9. Next I needed to assemble the sample. To really test this hypothesis our sample needs to be very large and very unique. For instance, a sample of 35 seasons, 15 for Hoffman, 12 for Rivera, and 8 for Wagner would not be enough because it offers just three unique pitchers. We cannot draw conclusions for a whole population based on three people. To alleviate this concern, I took every instance of 15+ saves from 1980-2007 and recorded the pertinent numbers.
This query produced 696 seasons and 220 unique closers, which should be a large enough sample. The statistics were entered based on splits in save situations vs. non-save situations. To analyze the numbers I am once again calling upon the T-Test. I mentioned T-Tests in a statistics primer last week, but it essentially compares the means (averages) of two different groups to determine if they are statistically different from one another. Just because Group A has a 2.33 ERA and Group B has a 2.61 ERA does not automatically mean that A’s ERA is lower than B’s. The sample may be too small, for instance, and so the ERAs may be different but they are not statistically different.
After running the T-Tests for the five recorded statistics, weighted by innings pitched, all five possessed small enough significance values; this means that the differential in means amongst the save situations and non-save situations splits are, in fact, statistically different. Below are the means of the two groups:
- IP/G (SS): 1.21
- IP/G (NS): 1.24
- ERA (SS): 2.91
- ERA (NS): 3.15
- OPS (SS): .629
- OPS (NS): .652
- BB/9 (SS): 3.08
- BB/9 (NS): 3.39
- K/9 (SS): 8.12
- K/9 (NS): 7.79
Since all of these means are statistically different from one another it appears that, yes, closers do perform worse in non-save situations. Their ERA is almost a quarter-point higher, their OPS against is almost twenty-five points higher. Additionally, their walks have increased along with a decrease in strikeouts. This isn’t to say that a closer posting a 3.15 ERA with 7.79 K/9 and a .652 OPS against in non-save situations is bad but rather that the numbers represent downgrades when compared to save situation statistics.
A likely reason for this is the usage pattern of closers relative to these situations. A Closer entering into a non-save situation generally signifies a lack of recent work. A rust factor may be prevalent. This is just the first in a series of articles in which I look at closers, because sometimes what is said may not be what is meant. Perhaps the idea of closers performing worse in non-save situations does not literally mean that; it could be that fans consider closers to perform worse in low leverage situations than high leverage. This would not always show up in a save vs. non-save investigation.
For now, though, in using 696 seasons and 220 unique closers from 1980-2007, they do in fact perform worse in non-save situations. Not much worse to the point that they should not be used, but worse.