Testing the Projection Systems’ Strengths and Weaknesses
April 12, 2009 3 Comments
Normal
0
false
false
false
MicrosoftInternetExplorer4
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:”";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:”Times New Roman”;
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}
There are several prominent projection systems for predicting how players will do in the coming baseball season. Depending on the season and the test used,
each has some claim of superiority. One
system that does not claim superiority is Marcel the Monkey, developed by Tom
Tango, which intentionally uses a simple method of adjusting for a weighted
average of historical performance, regression to the mean, and age
factors. Tango explains that it should
be the standard that any projection system worth looking at should be able to
beat.
Each of these systems is done differently, and each is bound
to have its strengths and weaknesses.
While many can claim that one is better than the other, it is very difficult to tell. My suspicion is
that since different systems have different methodologies, they will each excel
at projecting different groups of hitters, and at projecting different
statistics than each other. In this article,
I will test how each system does with many different subgroups, and we will see
that there are certain areas that each system excels over the others. PECOTA, for instance, struggles at projecting BABIP for speedy players. ZIPS struggles at projecting BABIP overall. OLIVER tends to underestimate walks and strikeouts, but overestimate homeruns. CHONE does a bit better overall, but for younger players, it appears that PECOTA does a bit better for the majority of them. I will explain each of these results in detail later on.
I gathered projections on 526 different players who got at
least 300 plate appearances in either 2007 or 2008 who were projected by the
five projection systems I tested. Many
players were not projected by one system or another, and these were excluded
from the sample. Obviously, this
eliminated a lot of useful information, since the main way that projection
systems differ is how they project young players and many young players were
not projected by MARCEL, and many other players were not projected by some
other systems too. However, there is
still useful information contained in these tests.
Normal
0
false
false
false
MicrosoftInternetExplorer4
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:”";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:”Times New Roman”;
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}
table.MsoTableGrid
{mso-style-name:”Table Grid”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
border:solid windowtext 1.0pt;
mso-border-alt:solid windowtext .5pt;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-border-insideh:.5pt solid windowtext;
mso-border-insidev:.5pt solid windowtext;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:”Times New Roman”;
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}
THE CONTENDERS
I tested five different projection systems. I will briefly describe them now. PECOTA is Baseball Prospectus’ projection
system, developed by Nate Silver. It has
been around for several years, and is the most commonly cited system. ZIPS is Baseball Think Factory’s projection
system, developed by Dan Szymborski.
Both PECOTA and ZIPS have somewhat similar methodologies, as they use
comparable players to enhance their projections. Both systems sort through thousands of historical players to
find comparable players. One criticism
that I have leveled against PECOTA in my last article is that it seems to
overrate batting averages for faster players. We will see later on this is mostly a BABIP projection issue.
I also tested CHONE, developed by Sean Smith. His projections can be found at
www.baseballprojection.com. Sean does not use
comparable players, but instead uses component based aging curves. OLIVER was developed by Brian Cartwright of
Statistically Speaking (here!) and Fangraphs.com. His area of expertise is park factors, and we
will see how this comes into play later on, as his projections seem stronger for hitters whose performance are largely affected by park factors.
MARCEL, as I mentioned earlier, was developed by Tom Tango and is
available here: http://www.tangotiger.net/marcel/.
ON BASE PERCENTAGE
PLUS SLUGGING PERCENTAGE
The first thing that I will do is use a general measure of
performance, OPS, to get a sense of which systems will be most successful with
which hitters. I will later delve into
different statistics, and see who succeeds at what. For now, it’s best to get a general sense of
who does best at projecting which group of players.
I will use Root Mean Square Error (RMSE) as my test in these
projections. I considered using
correlations, but since we are looking for which system projects how players do
most accurately, rather than who is best at matching the tails of the
distribution (which correlation will do best), I think this is the best
method. If anyone would like to see how
correlations and other tests do, please let me know and I will run a test.
Overall, here is how the projection systems did at
projecting hitters’ OPS.
|
OPS overall |
RMSE |
|
CHONE |
.0737 |
|
PECOTA |
.0753 |
|
OLIVER |
.0753 |
|
MARCEL |
.0763 |
|
ZIPS |
.0769 |
It is clear that CHONE has done the best in the past two
years. PECOTA and OLIVER are not too far
behind, and certainly have beaten MARCEL.
ZIPS just barely failed to beat MARCEL, but as we will see later in the
article, it does have clear strengths at certain groups of hitters.
In my last article, I pointed out that PECOTA tends to overestimate batting averages for
speedsters (as measured by whether those speedsters were in the top quarter of
their speed scores). For that group of
hitters, you can see that PECOTA does the second worst of the projection
systems, falling considerably behind MARCEL at projecting these hitters.
|
OPS for top ¼ of |
RMSE |
|
CHONE |
.0257 |
|
MARCEL |
.0261 |
|
OLIVER |
.0264 |
|
PECOTA |
.0273 |
|
ZIPS |
.0279 |
PECOTA is just barely behind CHONE at projecting the slower
three quarters of baseball players, however.
This is further evidence of my conclusion. Specifically, PECOTA is likely to
overestimate those players, rather than just miss by a larger margin.
|
OPS for bottom ¾ of |
RMSE |
|
CHONE |
.0282 |
|
PECOTA |
.0283 |
|
ZIPS |
.0287 |
|
MARCEL |
.0291 |
|
OLIVER |
.0304 |
After separating hitters by speed, I decided to see if
certain projection systems performed better or worse for hitters with different
levels of power. For simplicity, I split
the group about in half. The average
hitter in this sample of 526 players had about 15 homeruns last year. Look at the results:
|
OPS if HR>=15 |
RMSE |
|
MARCEL |
.0722 |
|
PECOTA |
.0738 |
|
CHONE |
.0756 |
|
ZIPS |
.0809 |
|
OLIVER |
.0821 |
|
OPS if HR<15 |
RMSE |
|
OLIVER |
.0701 |
|
CHONE |
.0724 |
|
ZIPS |
.0739 |
|
PECOTA |
.0762 |
|
MARCEL |
.0791 |
For higher power hitters, MARCEL does the best by far, and
for lower power hitters, MARCEL does by far the worst. OLIVER does the exact opposite–best for lower
power hitters and worst for high power hitters.
As Brian Cartwright pointed out himself recently,
homerun park factors are larger for hitters with less
power. As Brian’s specialty is park
factors, it makes sense that those hitters who are affected most by them would
be his strength. We will see later that OLIVER projects homeruns a little high across the board (and walks and strikeouts a little low), but perhaps OLIVER’s strength in park factors helped get the numbers right on these guys in general.
As each of these projection systems needs a way to account
for aging, it seems pretty useful to look at their ability to project OPS for
players in various age groups. For
simplicity, I started by separating hitters into those who were at least 30
years old and those who were under 30.
|
OPS if age>=30 |
RMSE |
|
CHONE |
.0711 |
|
OLIVER |
.0723 |
|
MARCEL |
.0725 |
|
ZIPS |
.0726 |
|
PECOTA |
.0734 |
|
OPS if age<30 |
RMSE |
|
CHONE |
.0754 |
|
PECOTA |
.0764 |
|
OLIVER |
.0772 |
|
MARCEL |
.0786 |
|
ZIPS |
.0795 |
CHONE does best with both groups actually, but the gap
between CHONE and PECOTA is much smaller for hitters under 30. The large advantage for CHONE seems to be for
hitters over 30. As the CHONE projection
system is based on component based aging curves, this makes sense. PECOTA is based on discovering comparable
players. This is most useful when there
is less data available on the hitter. As
more and more data is available on the hitter himself, perhaps finding
comparable players is less relevant. It
is also interesting that OLIVER does better than PECOTA for players 30 and
over. This is probably because OLIVER
uses more than three years of historical data, while I do not believe PECOTA uses more than three years of data.
When I separate out those hitters under 30 who were in the
bottom three quarters of speed scores by PECOTA’s estimates, ZIPS actually did
the best and PECOTA did better than CHONE.
|
OPS if age<30 |
RMSE |
|
ZIPS |
.0747 |
|
PECOTA |
.0762 |
|
CHONE |
.0765 |
|
MARCEL |
.0777 |
|
OLIVER |
.0796 |
Next, I broke down the age groups even further into those
hitters 25 and under, hitters 26-30 years old, hitters 31-35 years old, and
hitters over 35. Note that this only
includes players projected by all systems, so the younger group’s results are
mostly summaries of younger players who have major league experience before the
season begins.
|
OPS if age<=25 |
RMSE |
|
CHONE |
.0740 |
|
PECOTA |
.0778 |
|
MARCEL |
.0821 |
|
OLIVER |
.0823 |
|
ZIPS |
.0855 |
|
OPS if 25<age<=30 |
RMSE |
|
OLIVER |
.0743 |
|
PECOTA |
.0757 |
|
ZIPS |
.0762 |
|
CHONE |
.0762 |
|
MARCEL |
.0767 |
|
OPS if |
RMSE |
|
CHONE |
.0708 |
|
OLIVER |
.0723 |
|
PECOTA |
.0727 |
|
MARCEL |
.0741 |
|
ZIPS |
.0743 |
|
OPS if age>35 |
RMSE |
|
ZIPS |
.0666 |
|
MARCEL |
.0667 |
|
CHONE |
.0719 |
|
OLIVER |
.0726 |
|
PECOTA |
.0758 |
Interestingly, ZIPS performs poorly for most groups, but
seems to have a sizable advantage for hitters over 35. PECOTA does pretty well for the other groups,
but terribly for hitters over 35. This
is surprising as both systems use similar methodologies. Perhaps there is something about ZIPS that
makes it better suited to project older players.
One thing that concerned me about the older group is that
projecting older players has a lot to do with projecting attrition rates. These numbers only report hitters who had
enough plate appearances to qualify. My
concern was that ZIPS was merely over-projecting older players and those
projections were only playing a role if the older player over-performed expectations
enough to qualify. However, the opposite
is true. As seen in the table blow, ZIPS
had the lowest projection of any of the systems for this group of players. As it turns out, ZIPS may have a superior
system at identifying comparable players for older hitters.
|
Projection system |
average OPS if |
|
OLIVER |
.796 |
|
MARCEL |
.794 |
|
PECOTA |
.790 |
|
CHONE |
.784 |
|
ZIPS |
.779 |
After this, I tested hitters by handedness. The results were interesting.
|
OPS for |
RMSE |
|
CHONE |
.0699 |
|
OLIVER |
.0716 |
|
MARCEL |
.0736 |
|
PECOTA |
.0767 |
|
ZIPS |
.0888 |
|
OPS for lefties |
RMSE |
|
ZIPS |
.0736 |
|
PECOTA |
.0744 |
|
OLIVER |
.0749 |
|
CHONE |
.0755 |
|
MARCEL |
.0770 |
|
OPS for righties |
RMSE |
|
CHONE |
.0737 |
|
ZIPS |
.0753 |
|
PECOTA |
.0754 |
|
OLIVER |
.0765 |
|
MARCEL |
.0766 |
What jumps out at me most is ZIPS. For the 77 switch hitters in my sample, ZIPS
was the worst at projecting these hitters by far. However, ZIPS was the best at projecting
lefties and the second best at projecting righties. Perhaps ZIPS is not accounting for some
aspect of switch hitting in its comparisons.
Looking at the average OPS projection by system for those 77 switch
hitters, it is clear that ZIPS is the most pessimistic about them and probably
unrealistically. Note that these hitters had an OPS of .767 on average; ZIPS
only projected them at .748 on average.
My gut feeling is that switch hitting is probably correlated with
athleticism, and ZIPS may not be taking them into account. It would be interesting to know if ZIPS does
not incorporate handedness in its projection process. If not, this could be a fairly obvious area
for improvement.
|
SYSTEMS |
average OPS for |
|
MARCEL |
.776 |
|
PECOTA |
.764 |
|
CHONE |
.758 |
|
OLIVER |
.754 |
|
ZIPS |
.748 |
Some players are borderline major leaguers and while they
may be likely to end up in the minor leagues, a projection system will appear
better if they project higher OPS numbers for borderline major leaguers, and
these predictions will seem accurate if they are right and will not show up in
the tests of the systems if they are wrong.
This is not to say that the system is intentionally biased, but a system that gives too much credit to marginal players and regresses them too far to the mean will have this issue. Considering the following grouping below–by above and below average
OPS. From this, it seems that OLIVER may be a system that projects players this way. MARCEL seems to be unable to project poorer
hitters very well at all, compared to the other systems. CHONE also seems to do
relatively better at projecting poor hitters.
|
ABOVE AVERAGE OPS |
RMSE |
|
MARCEL |
.0689 |
|
PECOTA |
.0702 |
|
CHONE |
.0755 |
|
ZIPS |
.0782 |
|
OLIVER |
.0800 |
|
BELOW AVERAGE OPS |
RMSE |
|
OLIVER |
.0705 |
|
CHONE |
.0720 |
|
ZIPS |
.0756 |
|
PECOTA |
.0798 |
|
MARCEL |
.0827 |
Now that we have gotten a general sense of what these
projection systems do for different types of player overall, let’s refine this
general signal of OPS into various parts.
This will help us figure out who is actually good at what.
We’ll start with some of the more reliable things like the
three true outcomes, and then we’ll work our way to BABIP and the slash stats
(AVG/OBP/SLG). After that we can focus
on stolen bases and some fantasy stats.
TOTAL HOMERUNS AND
HOMERUNS PER AT-BAT
We’ll do a mixture of homeruns and homeruns per at bat
here. Some people may find one more
useful than the other.
Looking at the general summaries, the average number of
homeruns predicted for this group (who had an average homerun total of 15.17) was:
|
SYSTEMS |
Average HR total |
|
CHONE |
15.69 |
|
OLIVER |
15.69 |
|
ZIPS |
15.44 |
|
PECOTA |
14.87 |
|
MARCEL |
14.40 |
One might have expected that good enough hitters who were
able to get enough plate appearances to outperform expectations on homeruns,
but clearly these systems all were very close on average. In other words, none systematically over or
underestimated homeruns.
PECOTA does seem low, but that is actually due to not
projecting many at-bats. Considering the
alternative specification, homeruns per at-bat, PECOTA did pretty well. The average homeruns per at-bat in this group
was 3.18%.
|
SYSTEMS |
Average HR% |
|
OLIVER |
4.06% |
|
MARCEL |
3.36% |
|
PECOTA |
3.36% |
|
CHONE |
3.30% |
|
ZIPS |
3.26% |
OLIVER seemingly over-projected homerun percentage a lot,
while the other systems were pretty close to each other. The natural follow up question is which
system is actually better at projecting homeruns.
|
TOTAL HOMERUNS |
RMSE |
|
CHONE |
6.39 |
|
PECOTA |
6.39 |
|
ZIPS |
6.49 |
|
OLIVER |
6.83 |
|
MARCEL |
6.92 |
|
HR/AB |
RMSE |
|
PECOTA |
.0108 |
|
CHONE |
.0111 |
|
ZIPS |
.0114 |
|
MARCEL |
.0117 |
|
OLIVER |
.0148 |
PECOTA, CHONE, and ZIPS all seem to do a very good job with
homeruns, but none of them seems to do incredibly better than Marcel. OLIVER seems to struggle with homerun rate. It seems that OLIVER might be systematically
overestimated homerun rate. However, when I ran correlations for homerun rate projected and actual homerun rate, OLIVER placed ahead of every system except for PECOTA. What this means is that OLIVER knows what to expect the distribution of homeruns to be across the majors, but just pushes them all up across the board.
I next split these groups into hitters with above average
and below average homerun totals.
|
TOTAL HOMERUNS IF |
RMSE |
|
CHONE |
7.63 |
|
ZIPS |
7.95 |
|
PECOTA |
7.99 |
|
OLIVER |
8.17 |
|
MARCEL |
8.97 |
|
TOTAL HOMERUNS IF |
RMSE |
|
PECOTA |
4.95 |
|
MARCEL |
4.98 |
|
ZIPS |
5.21 |
|
CHONE |
5.34 |
|
OLIVER |
5.69 |
As it turns out, each system has a clear advantage over
MARCEL at projecting homeruns for power hitters, but for weaker hitters, MARCEL
actually does better than every system except for PECOTA. Without knowing the details of how they do
projections, it is difficult to say why, but it is certainly a fact worth
noting.
I again used PECOTA speed scores to separate players, and
found something rather interesting projecting homeruns for speedy players.
|
TOTAL HR for top ¼ |
RMSE |
|
PECOTA |
5.21 |
|
CHONE |
5.39 |
|
OLIVER |
5.51 |
|
ZIPS |
5.68 |
|
MARCEL |
6.09 |
|
TOTAL HR for bottom |
RMSE |
|
CHONE |
6.69 |
|
PECOTA |
6.74 |
|
ZIPS |
6.74 |
|
MARCEL |
7.17 |
|
OLIVER |
7.21 |
Remember that PECOTA performed poorer at projecting OPS for
speedy players? This certainly is not
because they struggle to project homeruns for these players. In fact, PECOTA is markedly better than these
other systems projecting HR for speedy players.
We will see later that batting average is where PECOTA struggles on
these players, as I explained in my previous article.
Remember that ZIPS was particularly good at projecting
hitters over 35? This is certainly
reinforced by their accuracy on projecting homeruns.
|
TOTAL HR if |
RMSE |
|
ZIPS |
5.89 |
|
CHONE |
6.15 |
|
MARCEL |
6.32 |
|
OLIVER |
6.98 |
|
PECOTA |
7.44 |
PECOTA seems to struggle mightily with this group, while
ZIPS seem to excel. Looking at their
average projected homeruns for players in this example, we see that while both
of them project lower homerun totals than the other systems, PECOTA is way too
low. This group of hitters averaged
15.39 homeruns.
|
SYSTEMS |
Average Total HR |
|
OLIVER |
16.58 |
|
MARCEL |
16.24 |
|
CHONE |
15.98 |
|
ZIPS |
14.04 |
|
PECOTA |
12.36 |
There certainly does seem to be some large differences in
how these systems project homeruns, and they seem to have their strengths and
weaknesses with different types of players.
Next, we will look at walk percentage.
WALK PERCENTAGE
Like homerun rates, walk rates also stabilize quickly. Hitters vary wildly with respect to walk
rates as many walk very frequently and others rarely do. The average walk rate for hitters in this
sample was 9.04%. With the exception of
OLIVER, each of these systems was pretty accurate at projecting walk rates.
|
SYSTEMS |
Average BB/PA projected |
|
CHONE |
8.98% |
|
PECOTA |
8.98% |
|
ZIPS |
8.89% |
|
MARCEL |
8.80% |
|
OLIVER |
8.08% |
It seems that something is a bit off with OLIVER when it
comes to walks, as it seems to be systematically underestimating their
frequency. As far as accuracy goes on
walk rate overall, here are the results.
|
WALK RATE |
RMSE |
|
CHONE |
.0200 |
|
PECOTA |
.0203 |
|
ZIPS |
.0205 |
|
MARCEL |
.0212 |
|
OLIVER |
.0233 |
OLIVER clearly does struggle with walk rate, and it was last of the systems when it came to correlation with actual walk rate too. CHONE does the best, ever so slightly. Very different results emerged when I
separated hitters into those with walk rates above 10% and those with walk
rates below 10%.
|
WALK RATE if above |
RMSE |
|
CHONE |
.0246 |
|
PECOTA |
.0253 |
|
ZIPS |
.0259 |
|
MARCEL |
.0277 |
|
OLIVER |
.0337 |
|
WALK RATE if below |
RMSE |
|
OLIVER |
.0144 |
|
MARCEL |
.0164 |
|
ZIPS |
.0168 |
|
PECOTA |
.0168 |
|
CHONE |
.0168 |
OLIVER goes from being by far the worst to by far the best
at projecting walk rates for those hitters with high walk rates. What it seems to struggle with is those
hitters with higher walk rates–who naturally have higher variance to their walk
rates.
STRIKEOUT RATE
The next statistic I tested was another reasonably reliable
one–strikeouts per at-bat. OLIVER again
was rather low on projecting these while the others projected higher strikeout
rates on average, and closer to the actual average of 18.27%.
|
SYSTEMS |
Average K/AB |
|
ZIPS |
18.33% |
|
MARCEL |
18.29% |
|
ZIPS |
18.17% |
|
PECOTA |
18.16% |
|
OLIVER |
16.43% |
Testing accuracy of strikeout percentage in general:
|
STRIKEOUT RATE |
RMSE |
|
CHONE |
.0309 |
|
PECOTA |
.0314 |
|
ZIPS |
.0341 |
|
MARCEL |
.0367 |
|
OLIVER |
.0401 |
OLIVER fell far behind, but primarily because it projected the average strikeout rate too high. Its correlation with actual strikeout rate beat MARCEL and ZIPS. Once again, adjusting the mean for OLIVER seems like it has potential for huge improvement.
Next, I separated hitters into hitters who strikeout more
often and less often than 20%.
|
STRIKEOUT RATE if |
RMSE |
|
CHONE |
.0384 |
|
PECOTA |
.0397 |
|
ZIPS |
.0400 |
|
MARCEL |
.0480 |
|
OLIVER |
.0582 |
|
STRIKEOUT RATE if |
RMSE |
|
OLIVER |
.0257 |
|
PECOTA |
.0259 |
|
CHONE |
.0260 |
|
MARCEL |
.0289 |
|
ZIPS |
.0304 |
As I mentioned earlier, PECOTA is less accurate at
projecting the OPS if speedy hitters, but more accurate at projecting their
homeruns. As it turns out, they also
excel at projecting their strikeout rates.
|
STRIKEOUT RATE for |
RMSE |
|
PECOTA |
.0326 |
|
CHONE |
.0326 |
|
ZIPS |
.0337 |
|
OLIVER |
.0375 |
|
MARCEL |
.0379 |
|
STRIKEOUT RATE for |
RMSE |
|
CHONE |
.0303 |
|
PECOTA |
.0310 |
|
ZIPS |
.0342 |
|
MARCEL |
.0363 |
|
OLIVER |
.0410 |
PECOTA in fact does very slightly better than CHONE at
projecting strikeout rates for speedy players and worse than CHONE for medium
and slow speed players. Neither
strikeouts nor homeruns explains PECOTA’s troubles at projecting speedsters.
BATTING AVERAGE ON
BALLS IN PLAY
This past offseason, many articles came out discussing BABIP
for hitters. I have worked on projecting
BABIP myself. I use batted ball data to
help my projections, but I do not believe that any of these systems do. As there is no batted ball data available for
players who played decades ago, ZIPS and PECOTA prefer to leave it
out. Sean Smith says that he uses batted
ball data for pitchers, but not for hitters when doing the CHONE
projections. I do not believe OLIVER or
MARCEL uses batted ball data either.
However, they do a fairly good job at approximating BABIP anyway. The actual standard deviation of BABIP for
players in this sample was .032. Since
we are dealing with samples for hitters with around 400 balls in play, this
indicates that some of the variance in BABIP is due to randomness in samples of
size 400. However, it does seem to imply
that BABIP skill has a standard deviation of around .022, after accounting to the variance in BABIP due to the binomial distribution. The standard deviation of BABIPs for players
in this sample as follows:
|
SYSTEMS |
St.Dev. |
|
CHONE |
.0177 |
|
PECOTA |
.0178 |
|
OLIVER |
.0198 |
|
MARCEL |
.0207 |
|
ZIPS |
.0231 |
ZIPS seems to have developed a sample with approximately the
right distribution of BABIPs, but he does not seem to have created the sample
correctly.
|
BABIP |
RMSE |
|
PECOTA |
.0291 |
|
CHONE |
.0293 |
|
OLIVER |
.0299 |
|
MARCEL |
.0306 |
|
ZIPS |
.0311 |
It seems pretty clear that the systems that were more
accurate were those more regression to the mean. I strongly believe that one of the areas that
projection has the most room to improve is BABIP, and will continue to do my
own research on this.
Breaking down BABIP by speed, we get the following results:
|
BABIP for hitters |
RMSE |
|
CHONE |
.0315 |
|
PECOTA |
.0318 |
|
MARCEL |
.0318 |
|
OLIVER |
.0319 |
|
ZIPS |
.0326 |
|
BABIP for hitters |
RMSE |
|
PECOTA |
.0281 |
|
CHONE |
.0286 |
|
OLIVER |
.0293 |
|
MARCEL |
.0302 |
|
ZIPS |
.0305 |
PECOTA beats CHONE for slower players, but for the fastest
players, CHONE pulls ahead. This
reiterates the point I made before–PECOTA overestimates average for
speedsters. In fact, since it seems to
do quite well with homeruns and strikeouts for speedsters, the area in which it
really struggles is that it over-projects BABIP for speedsters. While it does seem to hit the population
average overall for speedier players, it seems to lump other fast players in as
well, who do not actually do well in balls in play. Jose Reyes’ BABIP projection this year, for
example, was projected by PECOTA to hit .330 on balls in play this year,
despite a career BABIP of .311 and a BABIP of .319 last year. Instead of regressing Reyes to the mean,
PECOTA regressed him further away from the mean.
Next, I separated hitters into those with above average and
below average power. PECOTA was the
hands down winner on BABIP for power hitters, but less so for non power
hitters.
|
BABIP if total |
RMSE |
|
PECOTA |
.0266 |
|
CHONE |
.0285 |
|
MARCEL |
.0285 |
|
OLIVER |
.0291 |
|
ZIPS |
.0294 |
|
BABIP if total |
RMSE |
|
CHONE |
.0299 |
|
OLIVER |
.0305 |
|
PECOTA |
.0307 |
|
MARCEL |
.0320 |
|
ZIPS |
.0322 |
ZIPS seems to struggle to project BABIP for hitters of all
types. We have noted that ZIPS is pretty
good with hitters over 35. Is this true
for BABIP projection?
|
BABIP if age>35 |
RMSE |
|
PECOTA |
.0279 |
|
MARCEL |
.0285 |
|
OLIVER |
.0297 |
|
CHONE |
.0300 |
|
ZIPS |
.0305 |
|
BABIP if age<=35 |
RMSE |
|
PECOTA |
.0292 |
|
CHONE |
.0293 |
|
OLIVER |
.0300 |
|
MARCEL |
.0308 |
|
ZIPS |
.0311 |
Clearly, the area where ZIPS is excelling at projecting
older players is not BABIP, as it actually does the worst with that group is
well.
BATTING AVERAGE
Now that we have looked at the component statistics, we can
move on to discuss each of the slash stats.
The first of these is batting average.
CHONE does the best at projecting batting average, with the other
systems not too far behind.
|
BATTING AVERAGE |
RMSE |
|
CHONE |
.0248 |
|
PECOTA |
.0253 |
|
OLIVER |
.0253 |
|
ZIPS |
.0254 |
|
MARCEL |
.0259 |
Each of these systems beat MARCEL at projecting batting
average, but when I split hitters into those hitters who hit over .280 and hitters
who hit under .280, MARCEL actually does better with the high average hitters.
|
BATTING AVERAGE for |
RMSE |
|
MARCEL |
.0247 |
|
PECOTA |
.0257 |
|
CHONE |
.0273 |
|
ZIPS |
.0277 |
|
OLIVER |
.0280 |
|
BATTING AVERAGE for |
RMSE |
|
CHONE |
.0227 |
|
OLIVER |
.0231 |
|
ZIPS |
.0242 |
|
PECOTA |
.0250 |
|
MARCEL |
.0267 |
As I mentioned earlier, PECOTA seems to have trouble
projecting BABIP for speedsters. Using
the same subsets that I developed earlier, you can see this effect on overall
average, even though PECOTA was relatively better at projecting homerun and
strikeout rates for these guys.
|
BATTING AVERAGE for |
RMSE |
|
OLIVER |
.0237 |
|
CHONE |
.0243 |
|
MARCEL |
.0245 |
|
PECOTA |
.0253 |
|
ZIPS |
.0260 |
|
BATTING AVERAGE for |
RMSE |
|
CHONE |
.0249 |
|
PECOTA |
.0253 |
|
ZIPS |
.0257 |
|
OLIVER |
.0258 |
|
MARCEL |
.0263 |
Not only does PECOTA struggle with speedsters, but ZIPS does
as well. My best guess is that these
systems are comparing speedy players to speedy players from long ago who used
to have more success legging out infield singles against defenses on poorer
quality fields who did not have the same access to scouting material on the players who were playing against
them.
ON-BASE PERCENTAGE
& SLUGGING PERCENTAGE
As I noted earlier, OLIVER seems to systematically
underestimate walk rates across the board.
The result is clear in its effect on projecting OBP. The average OBP for hitters in this group was
.343. However, OLIVER projecting those
hitters to hit .335 on average, despite the other systems all falling between
.343-.347.
Overall, CHONE was the best at projecting OBP, with PECOTA
close behind. Both ZIPS and OLIVER
failed to top MARCEL.
|
ON-BASE PERCENTAGE |
RMSE |
|
CHONE |
.0873 |
|
PECOTA |
.0888 |
|
MARCEL |
.0898 |
|
ZIPS |
.0903 |
|
OLIVER |
.0931 |
When it came to testing SLG, the average SLG for hitters in
this sample was .436. However, even
though CHONE, OLIVER, and ZIPS were all right around there, MARCEL and PECOTA
average .443 and .448, respectively, for hitters in this sample. As far as accuracy, here are the results:
|
SLUGGING PERCENTAGE |
RMSE |
|
CHONE |
.0516 |
|
PECOTA |
.0520 |
|
OLIVER |
.0521 |
|
MARCEL |
.0535 |
|
ZIPS |
.0538 |
CHONE and PECOTA were number one and two for both OBP and
SLG. ZIPS fell behind MARCEL at
both. OLIVER did beat MARCEL
considerably at projecting SLG.
STOLEN BASES, RUNS,
AND RUNS BATTED IN
Many of you are probably wondering how all this knowledge
will help your fantasy teams, so I will also include tests for how well these
systems projected these statistics.
OLIVER does not project these statistics, so these results will not
include them.
|
STOLEN BASES |
RMSE |
|
CHONE |
6.56 |
|
PECOTA |
6.61 |
|
ZIPS |
7.53 |
|
MARCEL |
8.05 |
CHONE does best with PECOTA close behind. However, if we were to separate hitters into
groups with more than 10 steals and those with less than or equal to 10 steals,
we can see that the high variance of stolen bases among those players who steal
a lot of bases is driving this result.
CHONE actually falls behind the other systems when it comes to
projecting stolen bases for slower players.
|
STOLEN BASES for |
RMSE |
|
CHONE |
10.89 |
|
PECOTA |
11.16 |
|
ZIPS |
13.47 |
|
MARCEL |
14.82 |
|
STOLEN BASES for |
RMSE |
|
MARCEL |
3.37 |
|
ZIPS |
3.70 |
|
PECOTA |
3.98 |
|
CHONE |
4.10 |
Interestingly, the ranking is exactly the opposite for the slower
guys. Perhaps the systems better are
projecting stolen bases are those that simply project higher stolen base
totals? This group of players who
qualified by being projected by all systems and having at least 300 PA had an
average of 8.69 stolen bases.
|
SYSTEMS |
Average SB total |
|
CHONE |
8.53 |
|
PECOTA |
8.45 |
|
MARCEL |
7.89 |
|
ZIPS |
7.73 |
When it comes to projecting runs scores and runs batted in,
CHONE was definitely the best at projecting runs scored, but PECOTA was the
best at RBI.
|
RUNS SCORED |
RMSE |
|
CHONE |
19.72 |
|
ZIPS |
20.32 |
|
PECOTA |
20.36 |
|
MARCEL |
21.84 |
|
RBI |
RMSE |
|
PECOTA |
19.79 |
|
ZIPS |
19.84 |
|
CHONE |
19.97 |
|
MARCEL |
21.00 |
SUMMARY
There is clearly a lot of information here, and it may be
tough to know exactly what to take away from all of this. In fact, I have way more results that I even
listed here, and there are probably a million tests that I did not even think of that could provide insight into how these systems work. The main things to take away are:
–CHONE was the best at projecting most things.
–PECOTA was very close behind but had some systematic
biases, specifically for speedy players’ BABIPs, which ZIPS struggled with as
well.
–ZIPS is behind the other systems, except it does quite
well with projecting the three true outcomes for players over 35.
–CHONE does better with older players in general, since its
specialty is aging curves, but PECOTA does better at finding comparable players
for younger players for whom less data is available (unless they fall into the
speedster category).
–OLIVER clearly contends and even takes the lead at some
things–especially at projecting hitters with lower homerun totals and other
players significantly affected by park effects.
However, OLIVER under-projects walks and strikeouts systematically and over-projects homeruns systematically, and
could probably be improved by adjusting how those outcomes are computed.
–None of the systems is terribly good at projecting
BABIP. The other systems regress BABIP
to the mean far more than ZIPS does, but ZIPS does far worse at even projecting
BABIP and would probably improve if it simply did what those systems do instead. However, I strongly believe that
projecting BABIP using batted ball data will help projection for this reason.
Are any of these differences between systems statistically significant?
Also, using RMSE as the only measurement of the accuracy of projections is problematic. It may be the single best measure, but using RMSE alone assumes the aim of all these systems is to get the best RMSE, and that may not be the case. Very cautious projection systems will get better RMSEs than systems which try harder to project shifts in performance level, but the latter may be considered better systems if they fulfill their aims.
“Very cautious projection systems will get better RMSEs than systems which try harder to project shifts in performance level”
While that may be true (I don’t know), it didn’t make a big difference here. Marcel didn’t go out and win every category despite being the most conservative of all the systems. Unless I’m misinterpreting your use of “cautious.”
Greg, that’s a good question about statistical significance. I’m somewhat embarrassed to say that I can’t recall how to do statistical significance for the difference between two different root mean square error tests? Does anybody know the formula I should even use for that? I’m having trouble even deriving what it would be.
As far as RMSE, using correlations did not change the answers significantly, so I used RMSE. Over on Tom Tango’s blog, there is a discussion in the thread about this article and about what method to use, and the general consensus is RMSE. Some people think average absolute error might be the way to go, but they seem to think it’s better than correlations.
As far as my personal thoughts on which to use, I see your general point about the goal being to find shifts in performance level and pretty much effectively determine who is underrated or overrated. I see that more as a goal for fantasy baseball– you don’t know want to know how good a guy is, you want to know if he’s better than other people think he is and should you draft him. For professional teams, the answer is a bit different. In that case, you’re trying to get a certain number of wins and approximate how many wins a player gets you, and that determines his worth. In that case, RMSE seems appropriate since it values correct valuations of players.