Shawn Freaking Estes is not about to regress to your damned mean
January 20, 2009 5 Comments
I don’t mean to pick on Sean Estes, I really don’t. I am eternally grateful to him for shutting out the Reds late in 2003, and that’s as far as that goes.
But he came up in an old thread on Tango’s blog about the worst pitchers in baseball, and he certainly is in the running. And I was looking for a face to put on the concept of the fact that some guys, no matter how much we wait around, are never going to manifest improvement of any sort. We’re not talking about learning a new pitch, or gaining mental toughness. We’re talking about the numbers catching up to the fact that real MLB teams are still willing to give you a job somewhere. I could have easily made this post about a guy like Daniel Cabrera – and in fact, let’s do that too. Daniel Cabrera laughs at your measurements of central tendency.
I’ve taken the time recently – here, here and here – to muse about how talent is distributed in MLB. And I think I’ve finally come up with something approaching a resolution, or a least a direction to take the conversation in.
I’d like to note for the record that I hate, hate, hate arbitrary playing-time cutoffs for studies. They are at times a necessary evil, but they’re never ideal. But without them, too much weight was being put on fringe players and not enough on regular players when I went to study the issue.
So here’s what I did. I took three years of data (2006-2008), broken down into single-season pitching lines. I took each pitcher-season’s RA and weighted it by the number of batters faced – so a pitcher with 1000 batters faced counted for 100 times more than a pitcher with 10 BPF.
Then, for the sake of being able to actually do graphs, I subsampled out 20,000 pitcher-seasons from the result. So, for instance, there are 105 pitching lines of Jeff Francis’s in the result set, 104 from Johan Santana – but only 22 from guys like Bob Wickman or 21 from Reynel Pinto. (I did this several times, to make sure I wasn’t ending up with a particularly biased subset of the data.) Then I graphed it:
I cut off the graph at 20 RA so that there was enough meaningful detail for us to see anything. The shape of the graph seems somewhat normal to the left of the 5 RA mark, but seems to taper off much more slowly to the left than we would anticipate if pitching was truly normally distributed. This - and grant that this is the interpretation of a layman, nothing more – looks like a very modest application of the “fat tail,” which is the reason that you can’t buy or sell a house for money these days.
Compare to this graph of, oh, the logarithm of RA (all values of RA included):
This graph seems more normal, doesn’t it?
The biggest difference isn’t in skew – skew was never the major issue in the distribution of pitching, unlike what several of us (including myself) speculated in the comments. The problem is kurtosis – our distribution is too tall in the middle to be truly normal, and the tails are out of whack as a result. (There is, in fact, too much kurtosis for us to even be truly log-normal, although the log-normal distribution seems to describe pitching better than the normal distribution.)
Is there a practical application to this? I think so – although I have no more fancy graphs or evidence to present, so consider what proceeds from here to be nothing more than informed speculation.
There is the assumption that comes with the normal distribution that events a certain number of standard deviations, or “sigma,” away from the mean are practically impossible. (This is where the term “Six Sigma” comes from, if you were curious.)
We can pretty readily disprove this assumption when it comes to major league pitching. You simply need:
- A baseball.
- A bat.
- A major-league hitter.
- An idea of where the fence would be at in an MLB park.
Gather those things, and then you try throwing the ball to the hitter while he holds the bat. I think it’ll be pretty quickly demonstrated that it’s possible to be much, much worse at pitching than six sigma below the league average.
Any regression-based projection of the worst pitchers in baseball is likely to be too rosy, for the simple fact that it’s possible to have a worse true talent level for pitching than the normal distribution is fully able to comprehend and accept.