A second look at Pythagorean win estimators
March 28, 2007 13 Comments
Over the past few days on the SABR statistical analysis list-serv, there’s been a bit of chatter about the Pythagorean win estimator. My guess is that most of the folks reading this post are familiar with the formula, but for the benefit of those who may not be, the formula was created by Bill James, in an attempt to model how many games a team “should” have won, based on how many runs they scored and how many they allowed. The original formula read: Winning % = RS^2 / (RS^2 + RA^2). It’s eerie resemblance to the Pythagorean theorem in geometry (the one you hated in high school) gave it a name. Several different modifications have been suggested in the intervening years, including changing the exponent to 1.82 (some say 1.81), and two “dynamic exponent” formulas (one by Clay Davenport, the other by David Smyth) which have a formula to calculate the proper exponent, which is then substituted in on a case-by-case basis.
Before coming on board here at MVN, I had meditated briefly on these formulae and their merits relative to each other, with the Smyth formula coming out the winner, if only by a tiny margin. In evaluating any estimator, there are two important questions to answer: how closely does it predict the observed values (in this case, the team’s actual winning percentages) and are the mistakes (in statistics-speak, residuals) in some way biased. In my original post, I found that the residuals were essentially centered around zero (very good!) and the standard deviation of the residuals for all four of the formulae was somewhere in the neighborhood of 4.3 wins. Additionally, the residuals all showed a minimal amount of skew.
There are a few more residual diagnostics to run to check for any additional biases in the estimators. For example, if the estimators over-estimate the winning percentages of good teams, but under-estimate the winning percentages of bad teams (or vice versa, for that matter), then there is a built-in bias to the estimator. Along with being accurate, no matter the team quality, an estimator should work no matter how many games were played in the season, and how many runs the team scored and/or gave up.
I used the Lahman database for this one, and selected out all teams who played at least 100 games. This gave me a database of 2370 team-seasons to work with. I calculated the projected winning percentages for the Pythagorean, Exp 1.82, Davenport, and Smyth formulae, and then subtracted each of them from the actual winning percentage to get the residual for each.
I calculated (well OK, my computer calculated them) correlation coefficients for the residuals of each formula and the following variables: games played, runs scored per game, runs allowed per game, wins, and actual winning percentage. None of the formulae were correlated with games played. There were small correlations observed between the original Pythagorean formula and runs scored per game (.106) and runs allowed (-.071). No other such correlations were observed. Those correlation values were significant, although are rather small in magnitude.
The biggest finding in my analyses was the fact that the residuals from the Exp 1.82 formula, Davenport, and Smyth formulae were all correlated with wins and winning percentage. The Exp 1.82 formula, likely the most-used and reported formula, showed correlation coefficients of -.346 and -.380, respectively. The Davenport (-.253 and -.269) and Smyth (-.256 and -.273) correlation coefficients were lower, although still notable. The original Pythagorean formula residuals had much lower correlations of -.095 and -.101. These findings suggest that Exp 1.82, Davenport, and Smyth all have a bias such that better teams are more likely to have their estimates in the formulas be lower than their actual winning percentage. Poor teams are more likely to have their estimates be higher than their actual winning percentage.
If the previous sentence made your head spin, here it is in English with numbers made up on the spot for pure illustrative purposes: Let’s say that a team won 94 games in the year in question. The Exp 1.82, Davenport, and Smyth formulas are more likely to be wrong in the direction of saying that the team should have won fewer games (91). A poor team that won 61 games is more likely to have their projection be much higher (perhaps 65).
So what? Since these formulas became popular, the differences between the projections and the actual results have been taken to indicators of such things as manager ability. (A less-than-proper use of the formula in my opinion, but it is the common application.) If a team wins more than its projection, the manager must be doing a good job, because he’s maximizing runs at the proper time to win games. If a team wins fewer than projected, the manager might be fired. If the formulas are biased though, some of the credit and blame being passed along due to them may be a statistical artifact. The bias built into the formula would make a manager from a last-place team look like he is underperforming, even as he now has to answer to the GM on having just lost 101 games. On the other hand, the manager on the successful team is more likely to look like he is over-performing and maybe will get a nice contract extension and raise out of it. Managers on bad teams look even worse, and managers on good teams look even better.
It looks like the Pythagorean estimators need a little bit of tinkering. They don’t need to be thrown out. In fact, to the contrary, they perform exceptionally well overall. The bias I identified is going to be most noticeable at the extremes, which is a common problem in estimators of this type. Analysts just need to be a little more careful in interpreting the results in those cases.
Remember: Even the Scarecrow didn’t get the Pythagorean theorem exactly right on the first try.