# What do Pythagorean residuals really measure?

A couple of weeks ago over on The Book blog, there was a lively discussion about Pythagorean expectations, started off by former StatSpeak contributor Matt Souders.  Matt is a proponent of a game-by-game method for calculating estimated wins, and has some pretty simple (and sound) logic to back his idea up.  Suppose that the Indians are beating the Yankees 21-0, and just to add insult to injury, they tack on a 22nd run.  The 22nd run does nothing to help them win.  It’s very rare that a team actually scores 20+ runs and still loses the game.  The problem is that Pythagorean formulae treat all runs the same.  There’s a big difference between the 22nd run in a blowout and the tie that breaks a 4-4 tie in the ninth.  Not all runs are created equal, why use a formula that treats all runs equally?
Previously, I’ve meditated on the major Pythagorean estimators.  For those who don’t know, this is a formula that generally takes a team’s seasonal runs scored and runs allowed totals and tells what a team’s record generally should have been given those numbers.  There are several formulae for calculating this rate, although the formulas given by David Smyth and Clay Davenport are considered the best.  Now, a Pythagorean estimate won’t always get things exactly right.  Suppose the theorem says that a team should have gone 91-71, but they actually went 89-73.  What happened?  Bad luck?  Bad management?  Bad karma?  Bad formula?  Or is it just random statistical noise?  The difference between the actual team win percentage and the projected one is called a residual in statistical terms.  But can we predict those residuals?  Are there identifiable factors that cause a team to over-perform or under-perform its Pythagorean projection?
The answer is yes.  Former StatSpeaker David Gassko found that a team with a balanced offensive lineup was more likely to outperform their projection.  He also found that a team with a manager who had been around a while was marginally more likely to outperform their projection.  The folks over at Baseball Prospectus found evidence that a good bullpen helps.  My own research has found that one reason that a team might outperform their expectation is that the actual formula for calculating the expectation is biased.  Residuals are correlated to the team’s actual winning percentage in such a way that good teams generally look like they are outperforming their projections while bad teams look like they are underachieving.
In his comments on the subject, Matt Souders suggests that team’s who are consistent on offense (in terms of runs scored) and inconsistent in their pitching (in terms of runs allowed) are more likely to outperform their projections.  Consistency on offense seems sensible.  Why would inconsistency in pitching make sense.  Suppose a team gives up six runs every single night and only scores five per night.  They’ll be 0-162 at the end of the season.  But if they alternate giving up 0 and 12 runs every other night, their runs allowed will be the same, but they will finish 81-81.  That’s an extreme example, but I think the point is pretty clear.  One more thing that makes sense to know is how the team does in one-run games.  The most important run in a game is the one that puts a team in the lead.  After that, they’re all “wasted” runs.  So, if a team doesn’t waste a lot of runs, they are more likely to do better than their Pythagorean projection.
I took Retrosheet’s game logs from 1980-2006 and calculated each team’s actual winning percentage and their Pythagorean projection, based on David Smyth’s formula and found the residual.  Then, I calculated the standard deviation of the number of runs scored and allowed per game over each team-season.  Finally, I calculated each team’s one-run game winning percentage and the percentage of games overall decided by one run.  The database had 748 team seasons in it.
Correlations between residual and
Actual Winning Percentage: .400 (repeating the effect described above)
Runs Scored SD: -.120 (teams with less variability — more consistency – are more likely to outperform)
Runs Allowed SD: .220 (teams with more variability — less consistency — are more likely to outperform)
One-run game winning percentage: .652(!)
One-run games/games played: .011 (the only non-significant finding).
Let’s toss the first four factors into a regression equation predicting to Pythagorean residuals.  In the resulting equation, all four remain significant, but the interesting finding is this.  The multiple-R is .815, for a total R-squared for the model of .665.  Two thirds of the variance in Pythagorean residuals can be explained through these four indicators.  Not bad.
The only question left (and it’s a wide open one, to which much paper and computer screen has been devoted) is whether there are any skills that predict success in one-run games.  I’ll have to think about that one.

### 7 Responses to What do Pythagorean residuals really measure?

1. studes says:

Pizzacutter, I did a similar thing a couple of years ago, in this article:
My approach was different than yours, and I’m sure it wasn’t as rigorous, but I found similar results.

2. Pizza Cutter says:

Studes, I think I linked your article in the text. If I didn’t, I meant to…

3. Pizza Cutter says:

Apparently, I didn’t… ah well… I did read it in researching it before I wrote. Everyone else who’s reading this message and interested in the topic would do well to click on the above link.

4. studes says:

No problem. Thanks, PC.

5. Matt Souders says:

I should point out that PythagenMatt W% (my game by game method) not only adjusts out the biases of consistency/inconsistency on offense and defense (blowout games mostly), but it accounts for the success of teams in one-run games.
The question, though, is whether performance in one-run games is auto-correlated. Do teams that demonstrate a skill in one-run games over partial seasons continue to demonstrate that skill going forward?
I would think there is some correlation (since one-run skill is partially bullpen strength and line-up balance), but I don’t know how stable it is.

6. Pizza Cutter says:

The problem with autocorrelation on 1-run games is that the only way to get a decent sample size is to measure season by season and after each season, the rosters change. As I hinted in the article, it’s something I’m going to be looking at.

7. Mike Mehl says:

This reminds me of a brief study I did some years ago:
Suppose Team A is the consummate small ball team. They average 5 runs per game, but never score more than one run in an inning.
Team B is Earl Weaver’s dream team: It also averages 5 runs per game, but either has a big inning, scoring 5 runs, or scores no runs at all.
By any version of Pythagoras, in head-to-head competition the teams should be 0.500 in the long run.
In practice, Team A wins 57.5% of the time, including most of the extra inning games. The reason is, of course, that they are more consistent. They will almost always score a few runs per game, while in half its games Team B will score no runs at all.
That said, the question is whether or not such “consistent” teams exist. In my example (something physicists call a Spherical Cow approximation), obviously not. In the real world, does a team that wins a lot of one run games have some special quality that allows it to win those games, or was it just luck? Matt Souders’ comment is relevant here. Do the same teams win lots of one run games year after year? No, see studes’ link. Does a team with a small S.D. in runs scored in year one have a small S.D. in year two? That I don’t know.