# Still more Pythagorean musings

October 15, 2007 7 Comments

Things continue to get interesting on the SABR Statistical Analysis chatlist on the issue of those pesky Pythagorean over-achievers. No less a luminary than the founder of the theorem itself, Bill James, has come up with a little study of his own on the subject of whether teams who under-achieve one year are more likely to under-achieve in the next year (and whether over-acheivers will over-achieve the next year)

In it, he takes the top 100 over-achievers and the top 100 under-achievers of all time (using the Smyth/Patriot/Pythagenpat formula). He finds that the top 100 over-achievers continued to over-achieve in the following year, although their level of over-achievement dropped from an average of 8.3 wins to an average of 0.47 wins. For the under-achievers, they too underachieved on average, but fell from 8.68 wins to 0.24 wins. He comes to the conclusion that while the effect isn’t zero, although it must be pretty small. (He also runs a matched groups design in Parts III and IV of his paper that made me scratch my head.)

What Bill is describing in his paper is a regression to the mean effect that doesn’t quite regress all the way to the mean. Let me take a look at this using a slightly different and more complete method. I took the database of all teams from 1901-2005 and calculated their actual and Pythagenpat winning percentages, plus the Pythagenpat residuals. I did the same for the following year for each team and matched the two up. This gave me 2084 team-seasons. The year-to-year correlation for Pythagenpat residuals is .043. The mean of Pythagenpat residuals is zero.

That means that, knowing nothing else, our best guess for next year’s Pythagenpat residual can be given as:

0.043 * This Year’s residual + (1 – 0.043) * mean. Since the mean is zero, that term drops out. 8.3 wins above expectation in year one would have a year two expecation of .043 * 8.3 + (1 – .043) * 0. The answer is .3569. Bill found that the actual teams checked in at 0.47 in the next year. (And if I had twenty minutes of your time, I’d explain why what I just did was playing really fast and loose with some rules of math to get that number… Suffice it to say, it’s good enough for the situation.) The 100 under-achievers would have an expectation in year 2 of -.3724. Bill got -.24. So, no the effect size is not zero, just like the chances that I will be hit by a bus today on my way walking to work are not zero. But, since I generally look both ways before crossing the street, those chances aren’t anything to worry about. I wouldn’t worry about this effect either.

Bill also brings up another interesting question posed by Mike Emeigh as to which was the better predictor of next year’s actual winning percentage for a team: their current year’s actual winning percentage or their current year’s Pythagorean projection. Since I had the data set sitting in front of me, it seemed a shame not to ask the question.

Correlation between Year 2’s Actual Winning Percentage and:

Year 1’s Actual Winning Percentage = .603

Year 1’s Pythagenpat Winning Percentage = .626

I even ran Cohen’s test for specficity of correlated outcomes, and Pythagenpat really is significantly better (t = 4.35, for the curious) at predicting next year’s record. Not by a lot, but it’s the better bet. Still, score another one for Pythagoras.

When comparing one year with the next year for pythagorean regression to the mean, is there any concern that the team’s personnel could be very different from one year to the next and that you are only comparing a city and a team name from year one to year two, instead of a team with the exact same set of players. Just curious. vr, Xei

Of course, there’s concern. In theory, a team could completely turn itself over from year to year. That doesn’t happen, but it introduces a serious confound. This is why I prefer in-year/split-half investiagtions of Pythagorean matters.

I also, initially, scratched my head after reading the matched teams portion of James’ study, but after thinking it over a few times, it actually makes a bit of sense: you’re talking about teams that, ostensibly, should–by virtue of their pythags–perform similarly. You have one group which outperforms, however, and another group that underperforms, despite having nearly identical run-differentials with nearly identical runs scored/runs allowed (ideally in the same sample size of games, but as James notes, that didn’t always happen).

It would stand to reason that, if there *is* some regression to the mean, then both teams should both regress to the mean that is their pythag record–but it didn’t happen. The teams that outperformed the previous year outperformed the next, and the teams with nearly identical pythags who underperformed the previous year *also* underperformed the next.

So the group matching makes some sense, since it (ostensibly) reinforces the notion that there’s something inherent to the teams themselves that’s causing them to outperform or underperform their pythag records, and not mere chance/happenstance.

…nonetheless, I’m more inclined to agree with your analysis, Pizza Cutter, and say that while there wasn’t a regression to zero, there was some regression, period, and any way you slice it, the pythag is still a better predictor than actual win-loss record.

I can tell you’re a social scientist, Pizza Cutter. ðŸ™‚ I mean it in a good way, but…you rely on correlations and tests of significance WAY too much. ðŸ™‚

Still, I think we’re looking way too hard for ways to shoot down a principle that makes perfect logical sense but is not “mystical”…when you have a high RD, you’re going to win more games.

Incidentally, I believe that some of the lack of regression to the mean is the strength of schedule. Some teams just routinely have a tougher schedule than others, like, for example, if you’re in the NL West or NL Central, you’re schedule is going to be way easier than if you’re in the AL West. ðŸ™‚

Am I that obvious?

Pizza,

Late comment … why did Bill’s “matched pair” study make you scratch your head?

Phil, because it was late at night. I understand what he was getting at (taking a set of teams well matched to the 100 biggest Pythagorean deviants), but the methodology seemed a little contrived in its execution. Bill’s manner of matching the pairs makes some intuitive sense, but got sloppy when he started matching some teams with others who were well off their run stats. It seemed unnecessarily complicated and made me wonder “why doesn’t he just run a year-to-year correlation between the residuals.”