# The triumph of Pythagoras

On the SABR Statistical Analysis Listserv, there’s been a great deal of chatter concerning the good old Pythagorean win estimator.  This year, as it seems happens every year, most teams finish around their estimates.  But, there always seems to be that one oddity and this year, it’s the Arizona Diamondbacks.  The Diamondbacks were outscored this year (712-732), and had a Pythagorean expectation around 79 wins, depending on exactly which formula you use.  They won 90 games, good for  the best record in the NL.  Huh?
So, are the Arizona Diamondbacks a sub .500 team, like their Pythagorean projection says or are they a 90 win team like their… ummm… actual record says?  It’s an interesting question.  When trying to figure out how “good” a team is, which should we look at?  This is a topic which has been taken up before by Chris Jaffe, specifically with reference to the Diamondbacks, and more theoretically a few years ago by Dan Fox.  Dan found that early in the season, if you want to know what a team’s season-ending winning percentage will be, you’re best to look at their Pythagorean record.  That is, until about 100 games in, when the team’s actual record becomes the better predictor of their season ending record.  (By the end of the year, actual record is a perfect predictor of season-ending actual record.)  But which one better predicts what a team will do in its future games?
In July of this year, Joe Sheehan of Baseball Prospectus made the assertion that “Run differential is a key measure of team quality, and a better predictor of future performance than win-loss record.”  Well now, sounds like something we can test.  I took the Retrosheet Game Logs from 1980-2006.  (666, no kidding, team-seasons)  I took each team’s games in sequence.  After each game, I calculated the team’s actual winning percentage, as of that moment, as well as their Pythagorean projection as of that moment.  So, if a team is 10-10 after 20 games and had scored 93 runs while giving up 91, I ran the numbers.  (Methodological note: I used the David Smyth/Patriot formula and the standard formula with a 1.82 exponent, although they were pretty indistinguishable, so I just reported the Smyth formula)  Then, I calculated the team’s actual winning percentage over the rest of the season.  So, if that team went 72-70 over the last 142 games, I calculated those numbers.  I ran the numbers 162 times, one for each game of the year.  Which of the first two (current actual win percentage or current Pythagorean projection) was a better predictor of performance over the rest of the season from that point forward?
Want to see a pretty graph?

The graph shows correlation coefficients of the two methods to performance the rest of the way.  Coefficients are low at the beginning of the season because after game one, everyone’s either got a winning percentage of 1.000 or .000, and that’s not going to correlate well with much of anything.  At the end of the year, there’s the same problem in the opposite direction.  Focus on the middle part of the graph, where the sample sizes in both halves are roughly equivalent.  That’s where the story is.  You’ll see that the green line, representing the Pythagorean projection (using the Smyth method, although the 1.82 method had the same pattern) at that particular moment is consistently above actual winning percentage.  At the exact midpoint of the season (81 games), Pythagorean projection correlates with winning percentage the rest of the way at .494, while actual winning percentage has a correlation of .464.
(Side note: The weird jump around game 110 is because of the 1981 and 1994 seasons.  Teams played a little less than 110 games in those years, which led to some funky data in those years… just enough to cause a little blip in the data.)
In terms of predictive power, run differential really is the more important information to know when it comes to predicting the future.  What’s the deal with the Diamondbacks?  Well, for what it’s worth, the correlation between Pythagorean projection and future performance at 81 games is .5, which isn’t bad, but it isn’t all that great.  In fact .5 is possibly the most infuriating correlation coefficient out there.  .5 means that about 25% of the variance is explainable by whatever factor you’re using as a predictor.  25% is a quarter of the variance!  But 25% is only a quarter of the variance.  As the season wears on, the gap between Pythagorean and actual win percentage narrows, until they become roughly the same around game 150 or so, where the correlations are around .35.  The thing is that at game 150, the sample size for the “rest of the season” is only 12 games, and by that point, Pythagorean projection and actual winning percentage are usually mirroring one another.
But, there’s evidence here that a team is better described, over the long run, by their run differential than their actual record.  This will certainly come as great news to fans of the Padres and Braves, who finished with the 2nd and 3rd best Pythagorean win percentages in the NL this year, as they watch the Diamondbacks in the playoffs this year.

### 8 Responses to The triumph of Pythagoras

1. David Gassko says:

PC,
What I think you really need to do is first take out 1981 and 1994 (and if it wouldn’t be too much trouble, add in all the non-strike Retrosheet years to up your sample) and then run a regression with Pyth W% and actual W% at every number of games played to try to predict W% over the rest of the season. That will tell you if actual W% actually becomes a better indicator than Pyth W% (or even a useful one at all) at any point in time.

2. Sky says:

I’ve always assumed that the 150-game cutoff was due to teams still in it (based on actual wins, obviously) trying harder than teams who aren’t. By trying harder, I just mean they made trades FOR players instead of trading away players, they aren’t giving playing time to September callups, and they haven’t shut down any starting pitchers. Not sure how to control for that, though.

3. Pizza Cutter says:

David, that graph took me hours to make and would take much longer to run the set up work for to have 100 years worth of data in there, for what I can’t imagine will be vastly different results. I actually did run the original numbers with ’81 and ’94 out (and ’95, which was shortened by the ’94 strike as well), and found the same pattern. I just *blush* used the wrong output to make the graph. The edited data showed the same pattern, just without the blip.
As to whether actual win% ever becomes the better predictor, first off, I think it only actually becomes a numerically higher correlation at the very bitter end of the season (when sample sizes make it pretty irrelevant anyway). I could run the specificity test to see if the difference between the correlation coefficients was significant, but it’s not worth chasing.

4. Xeifrank says:

How many standard deviations were the Diamondbacks off from their pythagorean record?
Isn’t a normal statistical occurrence for a couple of teams to be off from their pythagorean record?
What is the margin of error of the pythagorean formula over a 162 game sample size?
Thanks,
vr, Xei

5. Pizza Cutter says:

http://baseballpsychologist.blogspot.com/2007/03/cleveland-indians-and-pythagoras.html
Standard deviation of Pythagenpat is .026080 (4.22 games per 162), so the Diamondbacks were roughly 2.5 to 3 SD’s above expectation.

6. Pizza Cutter says:

The standard criteria for “outside the realm of noise” is 2 SD (actually, 1.96) That means that it’s got less than a 5% chance of happening randomly. With 30 teams, however, probability says that 1 or 2 of them will have such a low-probability event happen.
The Indians actually got as unlucky in 2006 as the D’Backs got “lucky” in 2007. In the article that I linked above, I calculated that such abberrations would happen one every 10 years or so (given a 30 team league), that is one team would get as unlucky as the Indians last year, and one team would get as lucky as the DBs this year.

7. Xeifrank says:

Thanks for the replies PizzaCutter. Yes 2 SDs in a normal distribution is usually considered the cutoff line for being statistically significant. I just figured that if the SD was 2.5 and there being 30 teams, that one team being at the 2.5 threshold wouldn’t be out of the ordinary. And that all this talk about the Diamondbacks being lucky, could be written off to statistical noise (via normal distribution).
vr, Xei

8. Xeifrank says:

There are 30 MLB teams. 1 out of 30 teams is 3.3%. If you use the 2.5 SD number, 4.2% of the sample should be outside of the 2.5 SDs. Which could make the Diamondbacks outpacing of their pythagorean record by such a wide margin statistical noise. If you use 3 SDs, then it becomes more of an oddity. Thoughts?
vr, Xei