Last entry, I looked at a quick regression model for batter strikeouts based on contact percentage and swing percentage. This week, we’ll look at batter walk percentage based on swing percentage, zone percentage, and contact percentage.

Same rules as last time. All 2008 qualified batters, linear regression.

The results were pretty good, though not as accurate as last time. The r-squared of the equation is .7349, which, while good, isn’t quite as accurate as estimating batter strikeouts. I’m sure part of this is intentional walks, which wouldn’t likely have anything to do with swing%, zone%, or contact rate, but that is a topic for another day. Let’s look at some of the results.

More than Expected

Swing% | Zone% | Contact% | ActualBB% | Pred BB% | Difference | ||

Albert Pujols | 0.415 | 0.471 | 0.901 | 0.166 | 0.125 | 0.041 | |

Pat Burrell | 0.42 | 0.497 | 0.813 | 0.16 | 0.121 | 0.039 | |

BJ Upton | 0.404 | 0.511 | 0.805 | 0.154 | 0.118 | 0.036 |

I did not say “lucky” on this one for a couple reasons. First, since intentional walks are undoubtedly going to be a part of the error margin, I don’t feel that “luck” is the appropriate term, as we use it so much in the statistics community. Second, while an r-squared of over .7 is certainly a good one, there are a number of other factors to be analyzed as well.

Worse Than Expected

Swing% | Zone% | Contact% | ActualBB% | Pred BB% | Difference | ||

Garrett Anderson | 0.487 | 0.484 | 0.828 | 0.049 | 0.092 | -0.043 | |

Aubrey Huff | 0.433 | 0.483 | 0.848 | 0.081 | 0.116 | -0.035 | |

Jeremy Hermida | 0.435 | 0.492 | 0.778 | 0.087 | 0.121 | -0.034 |

Close to Expected

Swing% | Zone% | Contact% | ActualBB% | Pred BB% | Difference | ||

Adam Jones | 0.535 | 0.527 | 0.769 | 0.046 | 0.0464 | -0.0004 | |

Brian McCann | 0.464 | 0.481 | 0.855 | 0.101 | 0.10127 | -0.00027 | |

Josh Hamilton | 0.555 | 0.453 | 0.741 | 0.093 | 0.093057 | -5.7E-05 |

While there is still some work to be done, most importantly, that with intentional walks, the model is fairly accurate for a basic linear model. Next time, we will see how accurate such a regression formula is with pitchers.

