# Monkeying with Marcel

August 5, 2008 7 Comments

For those not familiar, the Marcel system is Sabermetrician Tom Tango’s system for projecting statistics in the coming year. The idea is simple enough, and he’s been rather emphatic about that being the entire point. The steps in short:

- Take three years worth of prior data (if available)
- Regress to the mean (if you don’t know what that is, more on that in a minute)
- Weight the data with the more recent data being weighed more heavily
- Apply an age adjustment (something I’ll skip for now)
- Let it rip

Tom’s stated (a few times) that this should be the basis for any good projection system, and he’s right, particularly around the issue of regressing prior year’s stats to the mean. A small demonstration, if you will.

Regression to the mean is simple to understand intuitively. Suppose that you have an amazing day today. You win the lottery *and* when you go to the drive thru at Arby’s and order the chicken fingers, they mistakenly give you five instead of four pieces along with a medium instead of a small Jamocha shake. Now that’s the recipe for a good day. What’s tomorrow going to be like? It will be worse, if only for the fact that there’s nowhere else to go but down. If you had a really bad day, the opposite thing happens. Now, let’s say that I based my entire measurement of how happy you usually are on an average day on how you’re doing based on that one day that you won the lottery. Or on that week? Really, if I want to get an idea of how good you feel on an average day, I need a bigger sample. But what if I only have 162 days?

Regression to the mean is a way to temper extreme observations, especially those drawn from limited time frames and/or from unreliable measures. On the first issue (limited time frames), the way to make a measure more reliable is to have a wider time sample. If Albert Pujols had a billion at-bats, I’d be a lot more comfortable in saying what his “true” talent is for hitting homeruns than if I only watched thirty at-bats. At 30 AB, I’d have some idea, but not the kind of precision that I would want to bet on. Some measures need more observations than other to become reliable, but alas, sometimes we only get a few at-bats to watch. Some measures are just unreliable by their nature, because they have much more to do with luck than any sort of skill. Think BABIP.

I’m a man who likes to look at the reliability of statistics. Nine months ago, I introduced the concept of split-half reliability and how it can be used to tell how reliable a stat is and when it becomes “reliable enough”. I did it by taking a sample of say 600 plate appearances and splitting them in half (even numbered ones vs. odd numbered ones). Then, I calculated whatever stat was interesting at the moment (K rate? 2B rate?) in the even-numbered plate appearances and in the odd-numbered plate appearances. I did this for everyone who had 600 PA’s to work with, and compared one set of 300 PA against the other set of 300 PA. If a statistic is reliable at 300 PA, then we should see that we get roughly the same rate from the even-numbered PAs as we do from the odd-numbered PAs. The way to check for that is through correlation between the two groups. The correlation that results is the split-half reliability of the stat at 300 PA. Why 300? Why not 299? Why not 301. Sure, the numbers aren’t going to change much from 299 to 300, but they will change. In fact, what’s to stop me from generating split-half reliabilities for a stat from 1 PA to 750? It’s just an engineering problem. I generated the appropriate numbers for BB rate and K rate. The one problem with generating these numbers is that it takes 24-36 hours of continuous computer processing (at least on my laptop) to generate one of those tables for a statistic. It’s do-able, it just takes a while.

Once we have the reliability for a measure given X observations, we also know how much to regress the measure to the mean. This is important because for some players, we have a sample of 700 PAs to work with and others we have 100. The split half reliability coefficient is “r”, and the formula for regressing something toward the mean is

r * player performance + (1-r) * league average.

I started by looking at batters and I found everyone’s actual BB and K rate for 1999-2007, and then they’re regressed BB and K rate for those years using the split half coefficients that I had just generated. Then, I lined everyone up from 2002-2007, with their 3 prior years worth of data (so, in 2002, back to 1999). I set up a regression equation to predict the “current” year’s BB rate, using the previous *actual* (non-regressed) BB rates from the three prior years. I limited the data to those players who got more than 250 PA in the “current” year. The actual rates did a pretty good job, and gave a formula of .416 * BBrate 1 year ago + .248 * BB 2ya + .148 * BB 3ya + .016. The regression had an R-squared of .590. Again, not bad. (There is the problem that those coefficients really don’t add up to 1.0 or anywhere near it. Trust me, that’s a problem.)

The regressed rates did a better job of predicting, with a best fit line of .545 * regBB 1ya + .264 * regBB 2ya + .215 * regBB 3ya – .002. R-squared: .614. Further, the standard error of the estimate was smaller in the regressed model (.0196 vs. .0201). The regressed predictors did a significantly better job.

If there’s one piece of the Marcel system that these data do call into question, it’s the weights which are placed on the previous year’s data. The Marcel system uses a 5/4/3 weighting system, with the most recent year being weighted with a five (so if I’m predicting 2009, the most recent year would be 2008), the second previous year gets a 4, and the third previous year gets a 3. In this case, with walks, it looks like about 53% of the weight in this equation is on the most recent year, with 26% on the second, and 21% on the third. Given a 12 point system, that suggests (with some rounding) a weighting of about 6.5/3/2.5 is most appropriate for predicting walk rate.

But let’s see if that holds up with strikeout rates. Same set up. Again, the regressed predictors did a better job than the actual non-regressed predictors. (R-squared: .735 vs. .694) The equation was .678 * regK 1ya + .186 * regK 2ya + .166 * regK 3ya – .008. Preojecting that out to the twelve point weighting system, that’s roughly 8/2/2.

Now, this is a very raw system. We do have other piece of of data and there may be other numbers that can be used to fine tune the predictions, including age (which the full Marcel system incorporates). But, there are two lessons to be learned here. One: regressing predictors to the mean is absolutely essential, both logically and in terms of the performance of the system. Two: while past performance does have a good influence on the future, different skills should be weighted and regressed in different ways. I’m not privy to how some of the other systems algorithims work, but hopefully, they’ve already incorporated the need to have a specific weighting system for each skill.

Good work on this one so far. It’s interesting that even with such a raw system you’re getting relatively high R-Squared numbers, at least on those two examples.

Do you have any idea of which stats are least predictable (e.g. doubles in a given year)? And alternatively, which are the most predictable?

Click on the link off of “introduced the concept” (for batting stats) and “split-half reliability” (for pitchers)

If you didn’t see it, you might enjoy the discussion that took place on The Book blog a couple days ago (click name).

What do you think of Peter’s proposed projection (ugh, can’t get around that alliteration) system?

Dan, hadn’t read that yet… let me digest that…

Dan, I posted this at The Book Blog as well:

Sorry to be jumping into this one so late, but let me see if I’ve got this right. There seem to be two issues intertwined here. One is the issue of picking the right prior to regress to when we regress to the mean. In the past, that’s been league average. People seem to be OK with the thought that we can do better than that, only there’s not a lot of consensus on exactly how to do that. Fine, open discussion.

The other is methodological. Peter seems to be arguing for some sort of ARMA (auto-regressive moving average) system for projections. Phil is right in #10 that we intuitively don’t consider an 0-for-20 for John McDonald the same way as an 0-for-20 for A-Rod. For McDonald, that’s par for his course. For A-Rod, that’s weird and we know that he’s really a better player than that deep down and we have the numbers to back it up.

That type of conceptualization is a complete paradigm shift away from the regression (to the mean) and (linear) regression systems that seem to be out there, not so much conceptually, but certainly in terms of statistical methodology.

I’m only superficially familiar with the ARMA/ARIMA concept. As I understand it, it’s set up to handle time series data (which all baseballs stats are… your OBP after your 342nd PA incorporates the data from the 1st-341st and tacks on another observation… hence a moving average…), but with the understanding that the 1st-341st events will have some correlation with the 342nd (it’s the same person in each case). (I’ve come across it when reading concerning intra-class correlation.) I don’t understand it much more than that or any of the math behind it, but I think that this is what Peter is talking about. Anyone else know more on this type of analysis and how it might be useful?

Hi there, I’ve done Marcel for years but I know that people use Marcel for in-season projections by weighing the more recent games more heavily but I was always a little hazy on the details. Can you explain how Marcel can be used in-season?

Thanks

s.park

This link might be helpful.