Batter Walks
August 29, 2009 3 Comments
Last entry, I looked at a quick regression model for batter strikeouts based on contact percentage and swing percentage. This week, we’ll look at batter walk percentage based on swing percentage, zone percentage, and contact percentage.
Same rules as last time. All 2008 qualified batters, linear regression.
The results were pretty good, though not as accurate as last time. The r-squared of the equation is .7349, which, while good, isn’t quite as accurate as estimating batter strikeouts. I’m sure part of this is intentional walks, which wouldn’t likely have anything to do with swing%, zone%, or contact rate, but that is a topic for another day. Let’s look at some of the results.
More than Expected
| Swing% | Zone% | Contact% | ActualBB% | Pred BB% | Difference | ||
| Albert Pujols | 0.415 | 0.471 | 0.901 | 0.166 | 0.125 | 0.041 | |
| Pat Burrell | 0.42 | 0.497 | 0.813 | 0.16 | 0.121 | 0.039 | |
| BJ Upton | 0.404 | 0.511 | 0.805 | 0.154 | 0.118 | 0.036 |
I did not say “lucky” on this one for a couple reasons. First, since intentional walks are undoubtedly going to be a part of the error margin, I don’t feel that “luck” is the appropriate term, as we use it so much in the statistics community. Second, while an r-squared of over .7 is certainly a good one, there are a number of other factors to be analyzed as well.
Worse Than Expected
| Swing% | Zone% | Contact% | ActualBB% | Pred BB% | Difference | ||
| Garrett Anderson | 0.487 | 0.484 | 0.828 | 0.049 | 0.092 | -0.043 | |
| Aubrey Huff | 0.433 | 0.483 | 0.848 | 0.081 | 0.116 | -0.035 | |
| Jeremy Hermida | 0.435 | 0.492 | 0.778 | 0.087 | 0.121 | -0.034 |
Close to Expected
| Swing% | Zone% | Contact% | ActualBB% | Pred BB% | Difference | ||
| Adam Jones | 0.535 | 0.527 | 0.769 | 0.046 | 0.0464 | -0.0004 | |
| Brian McCann | 0.464 | 0.481 | 0.855 | 0.101 | 0.10127 | -0.00027 | |
| Josh Hamilton | 0.555 | 0.453 | 0.741 | 0.093 | 0.093057 | -5.7E-05 |
While there is still some work to be done, most importantly, that with intentional walks, the model is fairly accurate for a basic linear model. Next time, we will see how accurate such a regression formula is with pitchers.
Thanks to Fangraphs.com for their contributions to this article.
Regression calculations performed by:
Wessa, P. (2009), Free Statistics Software, Office for Research Development and Education,
version 1.1.23-r4, URL http://www.wessa.net/
Mike Silver recently completed his requirements for the Sport Management Major at THE University of Massachusetts-Amherst, where he is a brother of Theta Chapter of Theta Chi Fraternity, the best house in the country. He is a huge Red Sox and Bruins fan, and longs for the days of the REAL Boston Garden, Cam Neely, and the ultimate Dirt Dog Trot Nixon. If you have any questions, you can reach him at mjasilver@gmail.com. Have a good night readers, and know that Mike hopes to hear from you soon. If you quote Mike in an article, please let him know. He’d love to hear it.
Why not just take IBB’s out of the BB totals, then?
Nice. I was wondering who would be the first to ask that question. I’m glad its a BtB writer who caught it (nice posts by the way).
A couple reasons. The first is that I didn’t realize the variable until I was in the process of writing the article.
The second is that I’m too impatient not to have printed it anyway, with the IBB in the calculations. The next article will probably remove the IBBs from the equation.
Thanks, Mike. I enjoy Stat Speak, so getting a compliment from a writer here is awesome.
And, gotcha. I was thinking there was some angle of having them in there is important that I was missing. I look forward to reading that.