Projecting the Landscape

Raise your hand if, after only a month or two of data, you have lauded or written off a player.  Come on, we’ve all done it, and many of us will likely continue to fall victim to this statistical fallacy.  It’s human nature as fans of this sport to generalize an entire season based on a small sample because many fans, even educated ones, tend not to understand what constitutes the true talent level of a given player.  Or even, for that matter, why these projections are more than just people throwing darts in the blind at certain numbers.
For starters, the reason small sample sizes fail to produce tangible results is that more numbers added will have a larger effect.  Think of two guys posting a .333 batting average, where one is 3-9 and the other is 300-900.  Add four hitless at-bats to each sample and suddenly player one plummets by 100 points to .231 while player the second “drops” to .332.  The latter player has enough data in his sample to disallow a single poor game or stretch from having a significant negative impact on his performance to date.
With that in mind, it is incorrect to say that (insert player) is having a (good/bad) season based on nothing more than a month or so of data.  Some players may simply get off to good or bad starts and the judgment would be based off of a snippet of data non-indicative of the player’s true talent level.  To say they haven’t yet met expectations would be okay because the “yet” clause stipulates an evaluation solely of performance to date and not necessarily what should/will happen, but generalizing an entire season on next to no information is not the right way to judge players.
These projection systems base decisions off of the known true talent level of the player, which brings us to the next point: What is a true talent level and how do these systems work?
In terms as basic as I can provide, projection systems weight a large enough sample of data from the recent past, with a bit of regression to the mean, accounting for age, and occasionally some other variables, such as height, weight, minor league numbers, etc.  The most commonly used systems are CHONE (Sean Smith), ZiPS (Dan Szymborski), PECOTA (Nate Silver), and Marcel (Tom Tango).  Tango’s Marcel is considered the “dumbest” in the sense that it takes the fewest variables into account–all you need to know is the player’s stats, the player’s age, and the league stats–yet it is essentially just as accurate as any other system out there.
The true talent level of a player is considered to be a weighted version of his last three years of production.  Just like one month is too small a sample to evaluate a season, one season is too small a sample to determine expected performance.  Due to this, we look for more years.  Three, in fact.  With these three years compiled a proper weight must be applied to each.  Use Andruw Jones as an example.  In 2005, he hit 51 home runs.  The next year, 41.  And last year, just 26.  Looking at this, it would not be accurate to exclude 2005 and 2006 and determine his expected performance based solely on 2007; on the same token, it would also not be accurate to weight those two previous years as heavily as last year, as the most recent data will be the more indicative of skill, but not the end-all solution.  This is why the Marcel projections would weight the 2007 season with a 5, 2006 with a 4, and 2005 with a 3.  All told, Andruw Jones was projected to hit 30 home runs this season, much worse than his 2005 and 2006 totals but slightly better than last season.
This brings us to the next issue: the relationship between in-season performance and true talent level/projections.  Since Andruw was projected to have an .816 OPS with 30 home runs, based on his true talent level, but seems way off-pace right now (.513 OPS and 2 home runs in 53 games), instead of asking whether or not the projection is wrong we should be wondering how those two months affect his talent level.  Would the two months of .513 OPS and 2 home runs constitute a large enough sample to change his projection to something much worse?  Or will his recent weighted data outweigh the smaller sample and call for a big second half?  Or both!  Will it call for a bigger second half albeit an overall line much worse than expected?  In other words, if expected to OPS .816 and he currently has a .513, it does not mean he will perform this poorly all season or that his new true talent level is .513; inversely, it also does not mean he will be so incredibly hot in the second half to even his OPS out to the .816.  Instead, what we expected of him changes.
Because the projection system, which is based on actual numbers posted by actual players, says it should be .816 but he gets off to a .513 start, you better believe he is going to perform better in the weeks and months after.  How much better depends on the impact of the in-season data on his true talent level.  Factoring in his performance to date, and age, as well as the last three years, Andruw is projected at a .774 OPS over the remainder of the season with 11 home runs, numbers much better than the first half.  However, the combination of both halves would result in a .660 OPS with 13 home runs, WAY down from the .816 and 30 thought of as possible prior to the season.
These numbers were calculated using an absolutely invaluable gadget, created by Sal Baxamusa of The Hardball Times, that, when given the player’s birthday, last three years, and performance to date in-season, projects what will happen over the remainder of the season.  I will use this more in-depth on Saturday, discussing Cliff Lee and CC Sabathia, but it is what projected Jones to have a better second half.  As long as the sample is large enough prior to using it, we can effectively make in-season projections based on this year AND the previous three.
This then asks the question: How far into a season is the true talent evident?  A few polls have found that around the 95-103 game mark is large enough of a sample to offer the full weight in the projection formula.  This is how players need to be evaluated.  Saying Ryan Howard is having a bad season after April would only be accurate if it drastically changes his projection.  In doing so, his April would have been so incredibly poor that the previous three years of weighted data were not valued as highly.  As was found in ‘The Book’ hot or cold streaks generally have very little predictive capabilities.  The same goes for evaluating trades.  Do not make evaluations based on what HAS happened, but rather what is expected to happen as well as several other variables such as money, controlability, etc.
Projection systems and evaluating talent with them can be confusing, yes.  Ultimately, though, when discussing players and their talent or skills, we need more then one or two months of data, in fact three years of data to actually be discussing their talent and skills.  When this is more commonly known by fans and analysts then the landscape of evaluation will be much more accurate.


15 Responses to Projecting the Landscape

  1. Hylton says:

    I’ve already written Jeff Francouer’s career off aftet this season…..but c’mon, who hasn’t?

  2. I thought Frenchy should have been sent down a while ago. Great raw ability but it wasn’t translating at the major league level, so a bit more harnessing down under (minors, not Australia) could have benefited him a year or two ago.
    As far as his second-half projection, his current true talent level says he will do this over the remainder:
    .274/.323/.445, .768 OPS, 9 HR, which would bring his season to:
    .251/.301/.405, .706 OPS, 18 HR

  3. dan says:

    People continue to look at small samples because that’s what they tell us on TV. After a good hitter hits .235 in April, John Kruk gets on TV pointing out all these mechanical flaws in his swing. The hitter changes nothing in his swing and suddenly increases his average to .290 by June 15th, and John Kruk is a genius.
    Even in mid-season… the Rays hit like .067 with RISP in the 7 games prior to the break and suddenly they need veteran leadership to show them how to hit in big spots.
    Yahoo fantasy doesn’t help either. They can let you see stats from the whole year, the last month, or even the last 7 days! A few people in my league (and my dad, in his) draw all kinds of conclusions from 7 days worth of data. The only time I change my lineup, regardless of who’s hot and who’s cold, is when guys have days off or when the Padres are at home (I have Gonzalez and Kouzmanoff).

  4. Yeah, I mean ultimately I’m not writing this post in anger at anyone or anything like that but I find a lot of people don’t do this, persay, because they don’t understand it and are confused by it.
    Now that Sal’s spreadsheet is out there really isn’t an excuse.

  5. Hilton, can you explain exactly what you’re doing so I can help?
    The way it works is you go to The Hardball Times, copy the entire block of stats on that first tier, make sure the PASTE tab section of the spreadsheet is clear and then paste in the players data.
    Then, be sure to enter his name and age in the appropriate section since age plays a big factor. Then go to the Quick’n’Dirty tab and it should be there.

  6. Jessica, yeah, it’s tremendous to use because we can see whether or not poor or great performance actually changes the true talent level, which is all we should really care about.
    As you noted in your Mets post with the projections, just about everyone on the team is going to improve based on what we know about them. A big second 41.36% won’t necessarily even out the first half to offer the expected results prior to the season but it will definitely show that, say, Carlos Beltran is not a true .730 OPS guy or something along those lines.

  7. Sal’s spreadsheet is a tremendous resource, and it’s also interesting to see how the first 3.5 months of the season impact a player’s projection. It’s an especially useful tool right now during the height of the trading season to get a rough estimate of what impact a new acquisition is likely to have.

  8. Hylton says:

    I’m having problems with it. I paste everything correctly (or I think I do) and I get the DIV## thing in the updated projections tab.

  9. Anonymous says:

    […] a good article by Eric Seidman on why simply quoting prior stats is lazy, wrong, and […]

  10. Hylton says:

    Eric, that’s exactly what I’m doing. I’m also using Internet Explorer as it’s advised.

  11. My e-mail is Send me the spreadsheet, as is, and I’ll take a look.

  12. Hylton says:

    Okay Eric, I got it to work. Thanks for trying to help me, though.

  13. hobo hal says:

    “True talent” has to be the most hyperbolic piece of advertising language in the whole baseball statistics product category. “Ultimate” zone ratings is really bad as well though.
    The landscape of evaluation remains rather fallow. Sadly they linger around 2 of 3 correct answers even when the students get to define a correct answer.

  14. I have no idea what you’re talking about.

  15. Pizza Cutter says:

    Hal, we may not get “true talent”, but we’re a heck of a lot closer than using the good ole “eyeball test.”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: