Still more Pythagorean musings
October 15, 2007 7 Comments
Things continue to get interesting on the SABR Statistical Analysis chatlist on the issue of those pesky Pythagorean over-achievers. No less a luminary than the founder of the theorem itself, Bill James, has come up with a little study of his own on the subject of whether teams who under-achieve one year are more likely to under-achieve in the next year (and whether over-acheivers will over-achieve the next year)
In it, he takes the top 100 over-achievers and the top 100 under-achievers of all time (using the Smyth/Patriot/Pythagenpat formula). He finds that the top 100 over-achievers continued to over-achieve in the following year, although their level of over-achievement dropped from an average of 8.3 wins to an average of 0.47 wins. For the under-achievers, they too underachieved on average, but fell from 8.68 wins to 0.24 wins. He comes to the conclusion that while the effect isn’t zero, although it must be pretty small. (He also runs a matched groups design in Parts III and IV of his paper that made me scratch my head.)
What Bill is describing in his paper is a regression to the mean effect that doesn’t quite regress all the way to the mean. Let me take a look at this using a slightly different and more complete method. I took the database of all teams from 1901-2005 and calculated their actual and Pythagenpat winning percentages, plus the Pythagenpat residuals. I did the same for the following year for each team and matched the two up. This gave me 2084 team-seasons. The year-to-year correlation for Pythagenpat residuals is .043. The mean of Pythagenpat residuals is zero.
That means that, knowing nothing else, our best guess for next year’s Pythagenpat residual can be given as:
0.043 * This Year’s residual + (1 – 0.043) * mean. Since the mean is zero, that term drops out. 8.3 wins above expectation in year one would have a year two expecation of .043 * 8.3 + (1 – .043) * 0. The answer is .3569. Bill found that the actual teams checked in at 0.47 in the next year. (And if I had twenty minutes of your time, I’d explain why what I just did was playing really fast and loose with some rules of math to get that number… Suffice it to say, it’s good enough for the situation.) The 100 under-achievers would have an expectation in year 2 of -.3724. Bill got -.24. So, no the effect size is not zero, just like the chances that I will be hit by a bus today on my way walking to work are not zero. But, since I generally look both ways before crossing the street, those chances aren’t anything to worry about. I wouldn’t worry about this effect either.
Bill also brings up another interesting question posed by Mike Emeigh as to which was the better predictor of next year’s actual winning percentage for a team: their current year’s actual winning percentage or their current year’s Pythagorean projection. Since I had the data set sitting in front of me, it seemed a shame not to ask the question.
Correlation between Year 2′s Actual Winning Percentage and:
Year 1′s Actual Winning Percentage = .603
Year 1′s Pythagenpat Winning Percentage = .626
I even ran Cohen’s test for specficity of correlated outcomes, and Pythagenpat really is significantly better (t = 4.35, for the curious) at predicting next year’s record. Not by a lot, but it’s the better bet. Still, score another one for Pythagoras.