# Fun with DIPS: Not all balls in play are created equal

June 27, 2007 14 Comments

DIPS. The idea that a pitcher doesn’t have any say in what happens to the ball once it is hit, short of fielding a ground ball back to him. The now-famous original study found that pitchers showed very little year-to-year correlation in the percentage of balls in play that became hits. However, there is a large amount of year-to-year correlation with events that the pitcher does have control over, specifically walks, strike outs, home runs allowed, and hit batsmen. The natural corollary of the theory was that once the ball was hit, just about every pitcher becomes a league average pitcher.

Critics of DIPS theory often point to such counter-examples as Greg Maddux, who “pitches to contact”, yet in the 1990s, was anything but league average. Perhaps, they contend, ground ball pitchers or fly ball pitchers have better luck than others. There has been some discussion of GB/FB rates and DIPS, but to my knowledge (and I could be mistaken), no one has ever broken down DIPS theory by the type of ball in play.

I take as my data set Retrosheet play-by-play files from 2003-2006. I eliminated all home runs, and calculated each pitcher’s yearly BABIP on each type of ball in play (grounders, liners, pop ups, and fly balls that are not home runs). I restricted the sample to those who had at least 25 of that type of ball in play in the year in question. (So, a pitcher with 22 grounders and 28 liners would have an entry for BABIP for liners, but not for grounders.) I transformed the variables using a log-odds ratio method, as is proper for rate/probability variables. Then, as per my favorite statistical trick, I took the intraclass correlation for each type of ball in play.

The results:

Ground balls, .114

Line drives, .174

Pop ups, .075

Fly balls (non-HR), .194

You can read those ICC’s much like year-to-year correlations. The pitcher has the least control over whether pop-ups go for outs and the most for fly balls. Even the fly ball number works out to an R-squared value of 3.8%, which isn’t all that thrilling (it means that 96.2% of the variance is due to other factors), so the DIPS theory still seems pretty sound. On the other hand, the R-squared value for ground balls is 1.2%, so pitchers have a little bit more control over their fly balls than they do their ground balls. Still, those values are pretty tiny, so I wouldn’t make anything of it. I’m not saying anything new here, but the assumption that the pitcher is *totally* out of control of what happens is errant, although not all that far off from the truth. However, some pitchers, especially those who live on fly balls are a little bit more in control than others.

There’s one other issue that irks me. While doing some work for something else I’m in the process of writing, I found that the ICC for stolen base success rate (SB / (SB+CS)) was about .30. That’s an R-squared value of 9%, which is, in perspective, a lot higher than the general BABIP ICC of .182 that I found here, but with correlation you end up on a slippery slope. When does the ICC (or if you want to do year-to-year) become high enough that it’s a “skill” and not luck? Is success at stealing bases a skill? This isn’t an issue with an easy resolution in Sabermetrics or science in general, I realize, but it’s something to consider.

DSG has done a lot of work on DIPS and batted balls.

Here is one:

http://www.hardballtimes.com/main/article/batted-balls-and-dips/

And here is another

http://www.hardballtimes.com/main/article/dips-lips-and-hips/

Stealing bases is obviously a skill, but the question you are asking is “how many SB attempts do we need to establish the skill?”.

I like to have an r=.50, which teams me that regression toward the mean is 1-r (50%), and therefore the sample data represents 50% skill and 50% luck.

For example, for hitter’s, a player’s OBP skill is r=.50 at around PA=200. For pitchers’s GB per BIP, we’d get r=.50 at, I dunno, PA=75. So, we can say that we know as much about a hitter’s OBP skill at 200 PA as we would of a pitcher’s FB skill at 75 PA.

Keeping everythign against the 200 PA benchmark gives everyone a fairly common baseline to benchmark against.

teams=means

Tango, a correlation of .50 would have an r-squared (variance explained) of 25%, not 50%. As I understand regression to the mean, that would mean that we can infer 25% of his performance is based on skill (or at least previous performance) and the rest is “unexplained”, which we then generally project as league average.

I broke it down into players who attempted 0-9, 10-19, 20-29, 30-39, and 40-49 times in the season. The ICC’s climbed steadily within each group, .082, .232, .357, .487, and .416. So, the more you steal, the more likely it is that the outcome across the years will be stable.

I don’t have the output on how many players attempt 30+ steals, but it was rather small and it’s going to be a very biased sample. Those who run more are the ones who are better at it, and probably those who have more of a chance of stealing safely. I actually have those data handy: percentage of time that a runner attempts to steal when he is on first with second base open correlates with SB% at .349.

Maybe we need to call the commissioners’ office and demand that all MLB players attempt 40 SB per year, whether they like it or not!

Just fire all the Billy Beane type GM’s and make sure guys like Whitey Herzog and Dick Williams get to manage.

“As I understand regression to the mean, that would mean that we can infer 25% of his performance is based on skill (or at least previous performance) and the rest is “unexplained”, which we then generally project as league average.”

When I do projections I look at the r and not the r2. If for a given unit of PA/batters faced, something has an r = .50 I’ll use 50% regression to the mean. I don’t know if this is the statistically correct way to approach it, I just know from experience that it works better. Using an r2 instead would make my projections too close to the league mean, and pretty much useless.

That’s the case for any statistic that measures any amount of skill. Increase your sample size and the R goes up. The problem with steals is how many players attempt 30+ steals? Not many these days. And even fewer do it for a long enough period to make it into an intra-class correlation.

If you forced every player in MLB to run at least 40 times per year I’m sure your correlation would be very strong.

“When does the ICC (or if you want to do year-to-year) become high enough that it’s a “skill” and not luck?”

I think this is a common way people now look at performance stats, but I really believe it’s the wrong way to think about skills. A baseball skill matters exactly in proportion to the amount of variance among players, calculated in runs scored/allowed. If the range of skill is large (in terms of runs), it’s important; if not, it isn’t. It’s really that simple.

In my opinion, the ratio of skill to luck/noise in a metric — which is what correlation tells us — is a poor substitute for measuring the actual range of skill. If HBP rates had a correlation/ICC of .9, that wouldn’t make HBP a more important, or even more “real,” skill than Ks or BBs.

A low r does of course mean there will be a lot of fluctuation. For example, Glavine will have good and bad BABIPs. But — and this is the important point — he will always be 15 points lower than an average pitcher with the same luck. And why should we care about the luck? We can’t predict it (by definition), and every pitcher has it, in roughly equal proportions good and bad (if they play long enough). Given that, we want the guy with the best true talent, regardless of whether there’s a lot of noise or just a little.

The partial exception to this is that a pitcher who relies a lot on low-BABIP skill may be slightly less consistent year-to-year than a high-K pitcher. So his bad years may be a little badder (though his good years should also be a bit gooder).

But other than that, the noise only matters to the extent it makes it harder for us to accurately measure the skill in question — it has no bearing on how much skill there truly is.

Guy, a small point of disagreement on this one: Sure, from a strategic POV, the outcome of interest is always runs/wins and establishing a true skill level is a good goal, but from the GM’s (fantasy or real life)perspective, it’s good to know what is the inherent risk in a player.

For example, if I want to sign a player to hit HR, I would look for a guy who has a good track record of doing so in the past, knowing that HR rate has a small amount of variability. It’s not very likely that a 30 HR guy will turn into a 5 HR guy next year, assuming no injuries. But, what if I want to try to sign a good reliever and my basis of how good he is is based on his ERA. Bullpen ERAs are wildly fluctuating, so as a GM, signing a guy who had a 2.08 ERA out of the pen last year and expecting him to do it again is a risky bet.

Let me re-cast your example on Glavine in another light. Suppose I could sign Glavine for $15M or I could sign Jimmy Generic for $2M. We’ll say that Glavine’s “true” talent level at whatever skill I’ve identified is 15 points higher than Jimmy’s. But… the variance in that skill is very high. As those distributions of possible outcomes start to spread out, the chances of the two of them being equal (or even Jimmy beating out Glavine!) over a few observations start to increase. Do I want to pay Glavine $13M more if there’s only a 58% chance that he’ll be the better pitcher? You’re correct that over 1000 years, Glavine will be the better pitcher, but as a GM, I only sign players for 3 years at a time.

Now, we’re into risk management. Establishing true skill levels (or at least estimating them) is helpful, but knowing how much of this is due to chance is an important element in figuring how safe a bet it is. Let me put it this way: which investment are you more willing to put money into: something that’s guaranteed between 4-5% return or something that could land somewhere between losing 20 % or gaining 20%. You may have an appetite for risk that makes you think that the first or the second is a better choice, and that’s your decision. The point is that knowing how much uncertainty there is changes the way that you approach the situation.

Sean, statistically speaking (haha!), using R rather than R-squared is the wrong thing to do. Whether or not the predictions are more valid with one method or the other is an open question that can be tested against the data.

I think Sean is right on regression to the mean.

You care about the standard deviation as opposed to variance

http://www.socialresearchmethods.net/kb/regrmean.php

If you don’t believe it run some simulation between two years …. you’ll find that r is the right metric rather than r^2

I stand corrected. 1-r2 is the formula for variance still unexplained. In my head, I was putting together variance components, rather than regressing to the mean.