Projecting the Landscape
July 17, 2008 15 Comments
Raise your hand if, after only a month or two of data, you have lauded or written off a player. Come on, we’ve all done it, and many of us will likely continue to fall victim to this statistical fallacy. It’s human nature as fans of this sport to generalize an entire season based on a small sample because many fans, even educated ones, tend not to understand what constitutes the true talent level of a given player. Or even, for that matter, why these projections are more than just people throwing darts in the blind at certain numbers.
For starters, the reason small sample sizes fail to produce tangible results is that more numbers added will have a larger effect. Think of two guys posting a .333 batting average, where one is 3-9 and the other is 300-900. Add four hitless at-bats to each sample and suddenly player one plummets by 100 points to .231 while player the second “drops” to .332. The latter player has enough data in his sample to disallow a single poor game or stretch from having a significant negative impact on his performance to date.
With that in mind, it is incorrect to say that (insert player) is having a (good/bad) season based on nothing more than a month or so of data. Some players may simply get off to good or bad starts and the judgment would be based off of a snippet of data non-indicative of the player’s true talent level. To say they haven’t yet met expectations would be okay because the “yet” clause stipulates an evaluation solely of performance to date and not necessarily what should/will happen, but generalizing an entire season on next to no information is not the right way to judge players.
These projection systems base decisions off of the known true talent level of the player, which brings us to the next point: What is a true talent level and how do these systems work?
In terms as basic as I can provide, projection systems weight a large enough sample of data from the recent past, with a bit of regression to the mean, accounting for age, and occasionally some other variables, such as height, weight, minor league numbers, etc. The most commonly used systems are CHONE (Sean Smith), ZiPS (Dan Szymborski), PECOTA (Nate Silver), and Marcel (Tom Tango). Tango’s Marcel is considered the “dumbest” in the sense that it takes the fewest variables into account–all you need to know is the player’s stats, the player’s age, and the league stats–yet it is essentially just as accurate as any other system out there.
The true talent level of a player is considered to be a weighted version of his last three years of production. Just like one month is too small a sample to evaluate a season, one season is too small a sample to determine expected performance. Due to this, we look for more years. Three, in fact. With these three years compiled a proper weight must be applied to each. Use Andruw Jones as an example. In 2005, he hit 51 home runs. The next year, 41. And last year, just 26. Looking at this, it would not be accurate to exclude 2005 and 2006 and determine his expected performance based solely on 2007; on the same token, it would also not be accurate to weight those two previous years as heavily as last year, as the most recent data will be the more indicative of skill, but not the end-all solution. This is why the Marcel projections would weight the 2007 season with a 5, 2006 with a 4, and 2005 with a 3. All told, Andruw Jones was projected to hit 30 home runs this season, much worse than his 2005 and 2006 totals but slightly better than last season.
This brings us to the next issue: the relationship between in-season performance and true talent level/projections. Since Andruw was projected to have an .816 OPS with 30 home runs, based on his true talent level, but seems way off-pace right now (.513 OPS and 2 home runs in 53 games), instead of asking whether or not the projection is wrong we should be wondering how those two months affect his talent level. Would the two months of .513 OPS and 2 home runs constitute a large enough sample to change his projection to something much worse? Or will his recent weighted data outweigh the smaller sample and call for a big second half? Or both! Will it call for a bigger second half albeit an overall line much worse than expected? In other words, if expected to OPS .816 and he currently has a .513, it does not mean he will perform this poorly all season or that his new true talent level is .513; inversely, it also does not mean he will be so incredibly hot in the second half to even his OPS out to the .816. Instead, what we expected of him changes.
Because the projection system, which is based on actual numbers posted by actual players, says it should be .816 but he gets off to a .513 start, you better believe he is going to perform better in the weeks and months after. How much better depends on the impact of the in-season data on his true talent level. Factoring in his performance to date, and age, as well as the last three years, Andruw is projected at a .774 OPS over the remainder of the season with 11 home runs, numbers much better than the first half. However, the combination of both halves would result in a .660 OPS with 13 home runs, WAY down from the .816 and 30 thought of as possible prior to the season.
These numbers were calculated using an absolutely invaluable gadget, created by Sal Baxamusa of The Hardball Times, that, when given the player’s birthday, last three years, and performance to date in-season, projects what will happen over the remainder of the season. I will use this more in-depth on Saturday, discussing Cliff Lee and CC Sabathia, but it is what projected Jones to have a better second half. As long as the sample is large enough prior to using it, we can effectively make in-season projections based on this year AND the previous three.
This then asks the question: How far into a season is the true talent evident? A few polls have found that around the 95-103 game mark is large enough of a sample to offer the full weight in the projection formula. This is how players need to be evaluated. Saying Ryan Howard is having a bad season after April would only be accurate if it drastically changes his projection. In doing so, his April would have been so incredibly poor that the previous three years of weighted data were not valued as highly. As was found in ‘The Book’ hot or cold streaks generally have very little predictive capabilities. The same goes for evaluating trades. Do not make evaluations based on what HAS happened, but rather what is expected to happen as well as several other variables such as money, controlability, etc.
Projection systems and evaluating talent with them can be confusing, yes. Ultimately, though, when discussing players and their talent or skills, we need more then one or two months of data, in fact three years of data to actually be discussing their talent and skills. When this is more commonly known by fans and analysts then the landscape of evaluation will be much more accurate.