# Shawn Freaking Estes is not about to regress to your damned mean

January 20, 2009 5 Comments

I don’t mean to pick on Sean Estes, I really don’t. I am eternally grateful to him for shutting out the Reds late in 2003, and that’s as far as that goes.

But he came up in an old thread on Tango’s blog about the worst pitchers in baseball, and he certainly is in the running. And I was looking for a face to put on the concept of the fact that some guys, no matter how much we wait around, are never going to manifest improvement of any sort. We’re not talking about learning a new pitch, or gaining mental toughness. We’re talking about the numbers catching up to the fact that real MLB teams are still willing to give you a job somewhere. I could have easily made this post about a guy like Daniel Cabrera – and in fact, let’s do that too. Daniel Cabrera laughs at your measurements of central tendency.

I’ve taken the time recently – here, here and here – to muse about how talent is distributed in MLB. And I think I’ve finally come up with something approaching a resolution, or a least a direction to take the conversation in.

I’d like to note for the record that I hate, hate, hate arbitrary playing-time cutoffs for studies. They are at times a necessary evil, but they’re never ideal. But without them, too much weight was being put on fringe players and not enough on regular players when I went to study the issue.

So here’s what I did. I took three years of data (2006-2008), broken down into single-season pitching lines. I took each pitcher-season’s RA and weighted it by the number of batters faced – so a pitcher with 1000 batters faced counted for 100 times more than a pitcher with 10 BPF.

Then, for the sake of being able to actually do graphs, I subsampled out 20,000 pitcher-seasons from the result. So, for instance, there are 105 pitching lines of Jeff Francis’s in the result set, 104 from Johan Santana – but only 22 from guys like Bob Wickman or 21 from Reynel Pinto. (I did this several times, to make sure I wasn’t ending up with a particularly biased subset of the data.) Then I graphed it:

I cut off the graph at 20 RA so that there was enough meaningful detail for us to see anything. The shape of the graph seems somewhat normal to the left of the 5 RA mark, but seems to taper off much more slowly to the left than we would anticipate if pitching was truly normally distributed. This - and grant that this is the interpretation of a layman, nothing more – looks like a very modest application of the “fat tail,” which is the reason that you can’t buy or sell a house for money these days.

Compare to this graph of, oh, the logarithm of RA (all values of RA included):

This graph seems more normal, doesn’t it?

The biggest difference isn’t in skew – skew was never the major issue in the distribution of pitching, unlike what several of us (including myself) speculated in the comments. The problem is kurtosis – our distribution is too tall in the middle to be truly normal, and the tails are out of whack as a result. (There is, in fact, too much kurtosis for us to even be truly log-normal, although the log-normal distribution seems to describe pitching better than the normal distribution.)

Is there a practical application to this? I think so – although I have no more fancy graphs or evidence to present, so consider what proceeds from here to be nothing more than informed speculation.

There is the assumption that comes with the normal distribution that events a certain number of standard deviations, or “sigma,” away from the mean are practically impossible. (This is where the term “Six Sigma” comes from, if you were curious.)

We can pretty readily disprove this assumption when it comes to major league pitching. You simply need:

- A baseball.
- A bat.
- A major-league hitter.
- An idea of where the fence would be at in an MLB park.

Gather those things, and then you try throwing the ball to the hitter while he holds the bat. I think it’ll be pretty quickly demonstrated that it’s possible to be much, much worse at pitching than six sigma below the league average.

Any regression-based projection of the worst pitchers in baseball is likely to be too rosy, for the simple fact that it’s possible to have a worse true talent level for pitching than the normal distribution is fully able to comprehend and accept.

If you take it to mean that Estes’ “True Talent Level” will begin to improve until it becomes league average at some point, well then you are correct, it’s never going to happen.

What we really are talking about in “regressing to the mean” is not that the player will change, but that sometimes we have a scarcity of information on a certain player, so we have to make assumptions based on what we do know. Small samples can have random error, not related to the player’s talent, so we have to guess a little.

The first thing to look for is past performance, giving higher weight to the more recent. Go to the minors, to college, whatever complete and reliable info you can finbd. Then, in my projections I add 150 plate appearances of league average performance to each batter’s record. If that someone is an everyday player, that 150 would be 5-10% of his total. However, for someone with only a year’s experience, half of their total might be the 150 PAs of regression.

Plus, I don’t regres everyeone to the same numbers. If a batter is in Class A, I don’t give him major league average – instead, the Major League Equivalent (MLE) stats of an average Class A player, because that’s part of what I know about the player so far.

Right.

But the way we currently do regression assumes a normal distribution, and (at least for pitching) I don’t think that assumption holds true at the extremes.

If you are calculating the amount of regression to be applied by using the mean and standard deviation of the population, yes…however, I did empirical testing to find the amount that best predicted the next season, with the smallest root mean error.

Maybe what we need are better priors. (I had started to look at this, and then something else was brighter and shinier.) For example, Marcel projects guys who had 1 PA last year as league average hitters. The reason being that since we know nothing about the guy, we assume he’s league average, which of course is silly. If he were league average, he’d have gotten more than 1 PA. What we need is a comparison group of other guys who got a small number of PAs. Maybe within that distribution we’ll get something resembling normality.

Just a thought…

Yes, Brian, but it still presumes that players below and above the mean regress at close to the same rate.

Or put this way – a pitcher who has a 3.00 ERA after 20 IP is likely not a true-talent 3.00 ERA pitcher. It’s also very unlikely that he’s a 2.00 ERA pitcher that’s been unlucky over his first 20 IP.

A pitcher who has a 5.00 ERA after 20 IP is more likely to have a true-talent of 5.00 ERA. He’s also more likely to be a 6.00 ERA true-talent pitcher who has simply gotten lucky over his first 20 IP. That’s because in the talent pool there are a lot more 5 and 6 ERA pitchers than there are 2 or 3 ERA pitchers – and more 5 and 6 guys than league-average guys. That’s just the way the talent pool works.

I am not suggesting that we stop using RTM on “bad” pitchers altogether. But I do think that it’s not necessarily correct to regress Hong-Chih Kuo’s 2008 and Radhames Liz’s 2008 the same way.

Obviously, increased sample size, MLEs, better priors, etc. are going to increase our projection accuracy. But it’s still easier to be a bad pitcher than a good one, and our models should reflect that.