Testing the Projection Systems’ Strengths and Weaknesses

Normal
0

false
false
false

MicrosoftInternetExplorer4

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:”";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:”Times New Roman”;
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}

There are several prominent projection systems for predicting how players will do in the coming baseball season.  Depending on the season and the test used,
each has some claim of superiority.  One
system that does not claim superiority is Marcel the Monkey, developed by Tom
Tango, which intentionally uses a simple method of adjusting for a weighted
average of historical performance, regression to the mean, and age
factors.  Tango explains that it should
be the standard that any projection system worth looking at should be able to
beat.

Each of these systems is done differently, and each is bound
to have its strengths and weaknesses. 
While many can claim that one is better than the other, it is very difficult to tell.  My suspicion is
that since different systems have different methodologies, they will each excel
at projecting different groups of hitters, and at projecting different
statistics than each other.  In this article,
I will test how each system does with many different subgroups, and we will see
that there are certain areas that each system excels over the others.  PECOTA, for instance, struggles at projecting BABIP for speedy players.  ZIPS struggles at projecting BABIP overall.  OLIVER tends to underestimate walks and strikeouts, but overestimate homeruns.  CHONE does a bit better overall, but for younger players, it appears that PECOTA does a bit better for the majority of them.  I will explain each of these results in detail later on.

I gathered projections on 526 different players who got at
least 300 plate appearances in either 2007 or 2008 who were projected by the
five projection systems I tested.  Many
players were not projected by one system or another, and these were excluded
from the sample.  Obviously, this
eliminated a lot of useful information, since the main way that projection
systems differ is how they project young players and many young players were
not projected by MARCEL, and many other players were not projected by some
other systems too.  However, there is
still useful information contained in these tests.


Normal
0

false
false
false

MicrosoftInternetExplorer4

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:”";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:”Times New Roman”;
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}
table.MsoTableGrid
{mso-style-name:”Table Grid”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
border:solid windowtext 1.0pt;
mso-border-alt:solid windowtext .5pt;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-border-insideh:.5pt solid windowtext;
mso-border-insidev:.5pt solid windowtext;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:”Times New Roman”;
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}

THE CONTENDERS

I tested five different projection systems.  I will briefly describe them now.  PECOTA is Baseball Prospectus’ projection
system, developed by Nate Silver.  It has
been around for several years, and is the most commonly cited system.  ZIPS is Baseball Think Factory’s projection
system, developed by Dan Szymborski. 
Both PECOTA and ZIPS have somewhat similar methodologies, as they use
comparable players to enhance their projections.  Both systems sort through thousands of  historical players to
find comparable players.  One criticism
that I have leveled against PECOTA in my last article is that it seems to
overrate batting averages for faster players.  We will see later on this is mostly a BABIP projection issue. 
I also tested CHONE, developed by Sean Smith.  His projections can be found at
www.baseballprojection.com.  Sean does not use
comparable players, but instead uses component based aging curves.  OLIVER was developed by Brian Cartwright of
Statistically Speaking (here!) and Fangraphs.com.  His area of expertise is park factors, and we
will see how this comes into play later on, as his projections seem stronger for hitters whose performance are largely affected by park factors. 
MARCEL, as I mentioned earlier, was developed by Tom Tango and is
available here: http://www.tangotiger.net/marcel/.

ON BASE PERCENTAGE
PLUS SLUGGING PERCENTAGE

The first thing that I will do is use a general measure of
performance, OPS, to get a sense of which systems will be most successful with
which hitters.  I will later delve into
different statistics, and see who succeeds at what.  For now, it’s best to get a general sense of
who does best at projecting which group of players.

I will use Root Mean Square Error (RMSE) as my test in these
projections.  I considered using
correlations, but since we are looking for which system projects how players do
most accurately, rather than who is best at matching the tails of the
distribution (which correlation will do best), I think this is the best
method.  If anyone would like to see how
correlations and other tests do, please let me know and I will run a test.

Overall, here is how the projection systems did at
projecting hitters’ OPS.

 

OPS overall

RMSE

CHONE

.0737

PECOTA

.0753

OLIVER

.0753

MARCEL

.0763

ZIPS

.0769

 

It is clear that CHONE has done the best in the past two
years.  PECOTA and OLIVER are not too far
behind, and certainly have beaten MARCEL. 
ZIPS just barely failed to beat MARCEL, but as we will see later in the
article, it does have clear strengths at certain groups of hitters.

In my last article, I pointed out that PECOTA tends to overestimate batting averages for
speedsters (as measured by whether those speedsters were in the top quarter of
their speed scores).  For that group of
hitters, you can see that PECOTA does the second worst of the projection
systems, falling considerably behind MARCEL at projecting these hitters.

 

OPS for top ¼ of
speed scores

RMSE

CHONE

.0257

MARCEL

.0261

OLIVER

.0264

PECOTA

.0273

ZIPS

.0279

 

PECOTA is just barely behind CHONE at projecting the slower
three quarters of baseball players, however. 
This is further evidence of my conclusion.  Specifically, PECOTA is likely to
overestimate those players, rather than just miss by a larger margin.

 

OPS for bottom ¾ of
speed scores

RMSE

CHONE

.0282

PECOTA

.0283

ZIPS

.0287

MARCEL

.0291

OLIVER

.0304

 

After separating hitters by speed, I decided to see if
certain projection systems performed better or worse for hitters with different
levels of power.  For simplicity, I split
the group about in half.  The average
hitter in this sample of 526 players had about 15 homeruns last year.  Look at the results:

 

OPS if HR>=15

RMSE

MARCEL

.0722

PECOTA

.0738

CHONE

.0756

ZIPS

.0809

OLIVER

.0821

 

 

OPS if HR<15

RMSE

OLIVER

.0701

CHONE

.0724

ZIPS

.0739

PECOTA

.0762

MARCEL

.0791

 

For higher power hitters, MARCEL does the best by far, and
for lower power hitters, MARCEL does by far the worst.  OLIVER does the exact opposite–best for lower
power hitters and worst for high power hitters. 
As Brian Cartwright pointed out himself recently,
homerun park factors are larger for hitters with less
power.  As Brian’s specialty is park
factors, it makes sense that those hitters who are affected most by them would
be his strength. We will see later that OLIVER projects homeruns a little high across the board (and walks and strikeouts a little low), but perhaps OLIVER’s strength in park factors helped get the numbers right on these guys in general.

 

As each of these projection systems needs a way to account
for aging, it seems pretty useful to look at their ability to project OPS for
players in various age groups.  For
simplicity, I started by separating hitters into those who were at least 30
years old and those who were under 30.

 

OPS if age>=30

RMSE

CHONE

.0711

OLIVER

.0723

MARCEL

.0725

ZIPS

.0726

PECOTA

.0734

 

OPS if age<30

RMSE

CHONE

.0754

PECOTA

.0764

OLIVER

.0772

MARCEL

.0786

ZIPS

.0795

 

CHONE does best with both groups actually, but the gap
between CHONE and PECOTA is much smaller for hitters under 30.  The large advantage for CHONE seems to be for
hitters over 30.  As the CHONE projection
system is based on component based aging curves, this makes sense.  PECOTA is based on discovering comparable
players.  This is most useful when there
is less data available on the hitter.  As
more and more data is available on the hitter himself, perhaps finding
comparable players is less relevant.  It
is also interesting that OLIVER does better than PECOTA for players 30 and
over.  This is probably because OLIVER
uses more than three years of historical data, while I do not believe PECOTA uses more than three years of data.

 

When I separate out those hitters under 30 who were in the
bottom three quarters of speed scores by PECOTA’s estimates, ZIPS actually did
the best and PECOTA did better than CHONE.

 

OPS if age<30
& bottom ¾ of speed scores

RMSE

ZIPS

.0747

PECOTA

.0762

CHONE

.0765

MARCEL

.0777

OLIVER

.0796

 

Next, I broke down the age groups even further into those
hitters 25 and under, hitters 26-30 years old, hitters 31-35 years old, and
hitters over 35.  Note that this only
includes players projected by all systems, so the younger group’s results are
mostly summaries of younger players who have major league experience before the
season begins.

 

OPS if age<=25

RMSE

CHONE

.0740

PECOTA

.0778

MARCEL

.0821

OLIVER

.0823

ZIPS

.0855

 

 

OPS if 25<age<=30

RMSE

OLIVER

.0743

PECOTA

.0757

ZIPS

.0762

CHONE

.0762

MARCEL

.0767

 

OPS if
30<age<=35

RMSE

CHONE

.0708

OLIVER

.0723

PECOTA

.0727

MARCEL

.0741

ZIPS

.0743

 

 

 

OPS if age>35

RMSE

ZIPS

.0666

MARCEL

.0667

CHONE

.0719

OLIVER

.0726

PECOTA

.0758

 

Interestingly, ZIPS performs poorly for most groups, but
seems to have a sizable advantage for hitters over 35.  PECOTA does pretty well for the other groups,
but terribly for hitters over 35.  This
is surprising as both systems use similar methodologies.  Perhaps there is something about ZIPS that
makes it better suited to project older players.

 

One thing that concerned me about the older group is that
projecting older players has a lot to do with projecting attrition rates.  These numbers only report hitters who had
enough plate appearances to qualify.  My
concern was that ZIPS was merely over-projecting older players and those
projections were only playing a role if the older player over-performed expectations
enough to qualify.  However, the opposite
is true.  As seen in the table blow, ZIPS
had the lowest projection of any of the systems for this group of players.  As it turns out, ZIPS may have a superior
system at identifying comparable players for older hitters.

 

Projection system

average OPS if
age>35

OLIVER

.796

MARCEL

.794

PECOTA

.790

CHONE

.784

ZIPS

.779

 

After this, I tested hitters by handedness.  The results were interesting.

 

OPS for
switch-hitters

RMSE

CHONE

.0699

OLIVER

.0716

MARCEL

.0736

PECOTA

.0767

ZIPS

.0888

 

 

OPS for lefties

RMSE

ZIPS

.0736

PECOTA

.0744

OLIVER

.0749

CHONE

.0755

MARCEL

.0770

 

OPS for righties

RMSE

CHONE

.0737

ZIPS

.0753

PECOTA

.0754

OLIVER

.0765

MARCEL

.0766

 

What jumps out at me most is ZIPS.  For the 77 switch hitters in my sample, ZIPS
was the worst at projecting these hitters by far.  However, ZIPS was the best at projecting
lefties and the second best at projecting righties.  Perhaps ZIPS is not accounting for some
aspect of switch hitting in its comparisons. 
Looking at the average OPS projection by system for those 77 switch
hitters, it is clear that ZIPS is the most pessimistic about them and probably
unrealistically. Note that these hitters had an OPS of .767 on average; ZIPS
only projected them at .748 on average. 
My gut feeling is that switch hitting is probably correlated with
athleticism, and ZIPS may not be taking them into account.  It would be interesting to know if ZIPS does
not incorporate handedness in its projection process.  If not, this could be a fairly obvious area
for improvement.

 

SYSTEMS

average OPS for
switch-hitters

MARCEL

.776

PECOTA

.764

CHONE

.758

OLIVER

.754

ZIPS

.748

 

Some players are borderline major leaguers and while they
may be likely to end up in the minor leagues, a projection system will appear
better if they project higher OPS numbers for borderline major leaguers, and
these predictions will seem accurate if they are right and will not show up in
the tests of the systems if they are wrong. 
This is not to say that the system is intentionally biased, but a system that gives too much credit to marginal players and regresses them too far to the mean will have this issue.  Considering the following grouping below–by above and below average
OPS.  From this, it seems that OLIVER may be a system that projects players this way.  MARCEL seems to be unable to project poorer
hitters very well at all, compared to the other systems.  CHONE also seems to do
relatively better at projecting poor hitters.

 

ABOVE AVERAGE OPS

RMSE

MARCEL

.0689

PECOTA

.0702

CHONE

.0755

ZIPS

.0782

OLIVER

.0800

 

BELOW AVERAGE OPS

RMSE

OLIVER

.0705

CHONE

.0720

ZIPS

.0756

PECOTA

.0798

MARCEL

.0827

 

Now that we have gotten a general sense of what these
projection systems do for different types of player overall, let’s refine this
general signal of OPS into various parts. 
This will help us figure out who is actually good at what.

 

We’ll start with some of the more reliable things like the
three true outcomes, and then we’ll work our way to BABIP and the slash stats
(AVG/OBP/SLG).  After that we can focus
on stolen bases and some fantasy stats.

 

TOTAL HOMERUNS AND
HOMERUNS PER AT-BAT

 

We’ll do a mixture of homeruns and homeruns per at bat
here.  Some people may find one more
useful than the other.

 

Looking at the general summaries, the average number of
homeruns predicted for this group (who had an average homerun total of 15.17) was:

 

SYSTEMS

Average HR total

CHONE

15.69

OLIVER

15.69

ZIPS

15.44

PECOTA

14.87

MARCEL

14.40

 

One might have expected that good enough hitters who were
able to get enough plate appearances to outperform expectations on homeruns,
but clearly these systems all were very close on average.  In other words, none systematically over or
underestimated homeruns.

 

PECOTA does seem low, but that is actually due to not
projecting many at-bats.  Considering the
alternative specification, homeruns per at-bat, PECOTA did pretty well.  The average homeruns per at-bat in this group
was 3.18%.

 

SYSTEMS

Average HR%

OLIVER

4.06%

MARCEL

3.36%

PECOTA

3.36%

CHONE

3.30%

ZIPS

3.26%

 

OLIVER seemingly over-projected homerun percentage a lot,
while the other systems were pretty close to each other.  The natural follow up question is which
system is actually better at projecting homeruns.

 

TOTAL HOMERUNS

RMSE

CHONE

6.39

PECOTA

6.39

ZIPS

6.49

OLIVER

6.83

MARCEL

6.92

 

HR/AB

RMSE

PECOTA

.0108

CHONE

.0111

ZIPS

.0114

MARCEL

.0117

OLIVER

.0148

 

PECOTA, CHONE, and ZIPS all seem to do a very good job with
homeruns, but none of them seems to do incredibly better than Marcel.  OLIVER seems to struggle with homerun rate.  It seems that OLIVER might be systematically
overestimated homerun rate.  However, when I ran correlations for homerun rate projected and actual homerun rate, OLIVER placed ahead of every system except for PECOTA.  What this means is that OLIVER knows what to expect the distribution of homeruns to be across the majors, but just pushes them all up across the board.

 

I next split these groups into hitters with above average
and below average homerun totals.

 

TOTAL HOMERUNS IF
HR>=15

RMSE

CHONE

7.63

ZIPS

7.95

PECOTA

7.99

OLIVER

8.17

MARCEL

8.97

 

TOTAL HOMERUNS IF
HR<15

RMSE

PECOTA

4.95

MARCEL

4.98

ZIPS

5.21

CHONE

5.34

OLIVER

5.69

 

As it turns out, each system has a clear advantage over
MARCEL at projecting homeruns for power hitters, but for weaker hitters, MARCEL
actually does better than every system except for PECOTA.  Without knowing the details of how they do
projections, it is difficult to say why, but it is certainly a fact worth
noting.

 

I again used PECOTA speed scores to separate players, and
found something rather interesting projecting homeruns for speedy players.

 

TOTAL HR for top ¼
of speed scores

RMSE

PECOTA

5.21

CHONE

5.39

OLIVER

5.51

ZIPS

5.68

MARCEL

6.09

 

TOTAL HR for bottom
¾ of speed scores

RMSE

CHONE

6.69

PECOTA

6.74

ZIPS

6.74

MARCEL

7.17

OLIVER

7.21

 

Remember that PECOTA performed poorer at projecting OPS for
speedy players?  This certainly is not
because they struggle to project homeruns for these players.  In fact, PECOTA is markedly better than these
other systems projecting HR for speedy players. 
We will see later that batting average is where PECOTA struggles on
these players, as I explained in my previous article.

 

Remember that ZIPS was particularly good at projecting
hitters over 35?  This is certainly
reinforced by their accuracy on projecting homeruns.

 

TOTAL HR if
age>35

RMSE

ZIPS

5.89

CHONE

6.15

MARCEL

6.32

OLIVER

6.98

PECOTA

7.44

 

PECOTA seems to struggle mightily with this group, while
ZIPS seem to excel.  Looking at their
average projected homeruns for players in this example, we see that while both
of them project lower homerun totals than the other systems, PECOTA is way too
low.  This group of hitters averaged
15.39 homeruns.

 

SYSTEMS

Average Total HR
projected

OLIVER

16.58

MARCEL

16.24

CHONE

15.98

ZIPS

14.04

PECOTA

12.36

 

There certainly does seem to be some large differences in
how these systems project homeruns, and they seem to have their strengths and
weaknesses with different types of players. 
Next, we will look at walk percentage.

 

WALK PERCENTAGE

 

Like homerun rates, walk rates also stabilize quickly.  Hitters vary wildly with respect to walk
rates as many walk very frequently and others rarely do.  The average walk rate for hitters in this
sample was 9.04%.  With the exception of
OLIVER, each of these systems was pretty accurate at projecting walk rates.

 

SYSTEMS

Average BB/PA

projected

CHONE

8.98%

PECOTA

8.98%

ZIPS

8.89%

MARCEL

8.80%

OLIVER

8.08%

 

It seems that something is a bit off with OLIVER when it
comes to walks, as it seems to be systematically underestimating their
frequency.  As far as accuracy goes on
walk rate overall, here are the results.

 

WALK RATE

RMSE

CHONE

.0200

PECOTA

.0203

ZIPS

.0205

MARCEL

.0212

OLIVER

.0233

 

OLIVER clearly does struggle with walk rate, and it was last of the systems when it came to correlation with actual walk rate too.  CHONE does the best, ever so slightly.  Very different results emerged when I
separated hitters into those with walk rates above 10% and those with walk
rates below 10%.

 

WALK RATE if above
10%

RMSE

CHONE

.0246

PECOTA

.0253

ZIPS

.0259

MARCEL

.0277

OLIVER

.0337

 

WALK RATE if below
10%

RMSE

OLIVER

.0144

MARCEL

.0164

ZIPS

.0168

PECOTA

.0168

CHONE

.0168

 

OLIVER goes from being by far the worst to by far the best
at projecting walk rates for those hitters with high walk rates.  What it seems to struggle with is those
hitters with higher walk rates–who naturally have higher variance to their walk
rates.

 

STRIKEOUT RATE

 

The next statistic I tested was another reasonably reliable
one–strikeouts per at-bat.  OLIVER again
was rather low on projecting these while the others projected higher strikeout
rates on average, and closer to the actual average of 18.27%.

 

SYSTEMS

Average K/AB

ZIPS

18.33%

MARCEL

18.29%

ZIPS

18.17%

PECOTA

18.16%

OLIVER

16.43%

 

Testing accuracy of strikeout percentage in general:

 

STRIKEOUT RATE

RMSE

CHONE

.0309

PECOTA

.0314

ZIPS

.0341

MARCEL

.0367

OLIVER

.0401

 

OLIVER fell far behind, but primarily because it projected the average strikeout rate too high.  Its correlation with actual strikeout rate beat MARCEL and ZIPS.  Once again, adjusting the mean for OLIVER seems like it has potential for huge improvement.

Next, I separated hitters into hitters who strikeout more
often and less often than 20%.

 

STRIKEOUT RATE if
K/AB>.20

RMSE

CHONE

.0384

PECOTA

.0397

ZIPS

.0400

MARCEL

.0480

OLIVER

.0582

 

STRIKEOUT RATE if
K/AB<.20

RMSE

OLIVER

.0257

PECOTA

.0259

CHONE

.0260

MARCEL

.0289

ZIPS

.0304

 

As I mentioned earlier, PECOTA is less accurate at
projecting the OPS if speedy hitters, but more accurate at projecting their
homeruns.  As it turns out, they also
excel at projecting their strikeout rates.

 

STRIKEOUT RATE for
top ¼ of speed scores

RMSE

PECOTA

.0326

CHONE

.0326

ZIPS

.0337

OLIVER

.0375

MARCEL

.0379

 

STRIKEOUT RATE for
bottom ¾ of speed scores

RMSE

CHONE

.0303

PECOTA

.0310

ZIPS

.0342

MARCEL

.0363

OLIVER

.0410

 

PECOTA in fact does very slightly better than CHONE at
projecting strikeout rates for speedy players and worse than CHONE for medium
and slow speed players.  Neither
strikeouts nor homeruns explains PECOTA’s troubles at projecting speedsters.

 

BATTING AVERAGE ON
BALLS IN PLAY

 

This past offseason, many articles came out discussing BABIP
for hitters.  I have worked on projecting
BABIP myself.  I use batted ball data to
help my projections, but I do not believe that any of these systems do.  As there is no batted ball data available for
players who played decades ago, ZIPS and PECOTA prefer to leave it
out.  Sean Smith says that he uses batted
ball data for pitchers, but not for hitters when doing the CHONE
projections.  I do not believe OLIVER or
MARCEL uses batted ball data either. 
However, they do a fairly good job at approximating BABIP anyway.  The actual standard deviation of BABIP for
players in this sample was .032.  Since
we are dealing with samples for hitters with around 400 balls in play, this
indicates that some of the variance in BABIP is due to randomness in samples of
size 400.  However, it does seem to imply
that BABIP skill has a standard deviation of around .022, after accounting to the variance in BABIP due to the binomial distribution.  The standard deviation of BABIPs for players
in this sample as follows:

 

SYSTEMS

St.Dev.

CHONE

.0177

PECOTA

.0178

OLIVER

.0198

MARCEL

.0207

ZIPS

.0231

 

ZIPS seems to have developed a sample with approximately the
right distribution of BABIPs, but he does not seem to have created the sample
correctly.

 

BABIP

RMSE

PECOTA

.0291

CHONE

.0293

OLIVER

.0299

MARCEL

.0306

ZIPS

.0311

 

It seems pretty clear that the systems that were more
accurate were those more regression to the mean.  I strongly believe that one of the areas that
projection has the most room to improve is BABIP, and will continue to do my
own research on this.

 

Breaking down BABIP by speed, we get the following results:

 

BABIP for hitters
in top ¼ of speed scores

RMSE

CHONE

.0315

PECOTA

.0318

MARCEL

.0318

OLIVER

.0319

ZIPS

.0326

 

BABIP for hitters
in bottom ¾ of speed scores

RMSE

PECOTA

.0281

CHONE

.0286

OLIVER

.0293

MARCEL

.0302

ZIPS

.0305

 

PECOTA beats CHONE for slower players, but for the fastest
players, CHONE pulls ahead.  This
reiterates the point I made before–PECOTA overestimates average for
speedsters.  In fact, since it seems to
do quite well with homeruns and strikeouts for speedsters, the area in which it
really struggles is that it over-projects BABIP for speedsters.  While it does seem to hit the population
average overall for speedier players, it seems to lump other fast players in as
well, who do not actually do well in balls in play.  Jose Reyes’ BABIP projection this year, for
example, was projected by PECOTA to hit .330 on balls in play this year,
despite a career BABIP of .311 and a BABIP of .319 last year.  Instead of regressing Reyes to the mean,
PECOTA regressed him further away from the mean.

 

Next, I separated hitters into those with above average and
below average power.  PECOTA was the
hands down winner on BABIP for power hitters, but less so for non power
hitters.

 

BABIP if total
HR>15

RMSE

PECOTA

.0266

CHONE

.0285

MARCEL

.0285

OLIVER

.0291

ZIPS

.0294

 

BABIP if total
HR<=15

RMSE

CHONE

.0299

OLIVER

.0305

PECOTA

.0307

MARCEL

.0320

ZIPS

.0322

 

ZIPS seems to struggle to project BABIP for hitters of all
types.  We have noted that ZIPS is pretty
good with hitters over 35.  Is this true
for BABIP projection?

 

BABIP if age>35

RMSE

PECOTA

.0279

MARCEL

.0285

OLIVER

.0297

CHONE

.0300

ZIPS

.0305

 

BABIP if age<=35

RMSE

PECOTA

.0292

CHONE

.0293

OLIVER

.0300

MARCEL

.0308

ZIPS

.0311

 

Clearly, the area where ZIPS is excelling at projecting
older players is not BABIP, as it actually does the worst with that group is
well.

 

BATTING AVERAGE

 

Now that we have looked at the component statistics, we can
move on to discuss each of the slash stats. 
The first of these is batting average. 
CHONE does the best at projecting batting average, with the other
systems not too far behind.

 

BATTING AVERAGE

RMSE

CHONE

.0248

PECOTA

.0253

OLIVER

.0253

ZIPS

.0254

MARCEL

.0259

 

Each of these systems beat MARCEL at projecting batting
average, but when I split hitters into those hitters who hit over .280 and hitters
who hit under .280, MARCEL actually does better with the high average hitters.

 

BATTING AVERAGE for
avg>.280

RMSE

MARCEL

.0247

PECOTA

.0257

CHONE

.0273

ZIPS

.0277

OLIVER

.0280

 

 

BATTING AVERAGE for
avg<.280

RMSE

CHONE

.0227

OLIVER

.0231

ZIPS

.0242

PECOTA

.0250

MARCEL

.0267

 

As I mentioned earlier, PECOTA seems to have trouble
projecting BABIP for speedsters.  Using
the same subsets that I developed earlier, you can see this effect on overall
average, even though PECOTA was relatively better at projecting homerun and
strikeout rates for these guys.

 

BATTING AVERAGE for
top ¼ of speed scores

RMSE

OLIVER

.0237

CHONE

.0243

MARCEL

.0245

PECOTA

.0253

ZIPS

.0260

 

BATTING AVERAGE for
bottom ¾ of speed scores

RMSE

CHONE

.0249

PECOTA

.0253

ZIPS

.0257

OLIVER

.0258

MARCEL

.0263

 

Not only does PECOTA struggle with speedsters, but ZIPS does
as well.  My best guess is that these
systems are comparing speedy players to speedy players from long ago who used
to have more success legging out infield singles against defenses on poorer
quality fields who did not have the same access to scouting material on the players who were playing against
them.

 

ON-BASE PERCENTAGE
& SLUGGING PERCENTAGE

 

As I noted earlier, OLIVER seems to systematically
underestimate walk rates across the board. 
The result is clear in its effect on projecting OBP.  The average OBP for hitters in this group was
.343.  However, OLIVER projecting those
hitters to hit .335 on average, despite the other systems all falling between
.343-.347.

 

Overall, CHONE was the best at projecting OBP, with PECOTA
close behind.  Both ZIPS and OLIVER
failed to top MARCEL.

 

ON-BASE PERCENTAGE

RMSE

CHONE

.0873

PECOTA

.0888

MARCEL

.0898

ZIPS

.0903

OLIVER

.0931

 

When it came to testing SLG, the average SLG for hitters in
this sample was .436.  However, even
though CHONE, OLIVER, and ZIPS were all right around there, MARCEL and PECOTA
average .443 and .448, respectively, for hitters in this sample.  As far as accuracy, here are the results:

 

SLUGGING PERCENTAGE

RMSE

CHONE

.0516

PECOTA

.0520

OLIVER

.0521

MARCEL

.0535

ZIPS

.0538

 

CHONE and PECOTA were number one and two for both OBP and
SLG.  ZIPS fell behind MARCEL at
both.  OLIVER did beat MARCEL
considerably at projecting SLG.

 

STOLEN BASES, RUNS,
AND RUNS BATTED IN

 

Many of you are probably wondering how all this knowledge
will help your fantasy teams, so I will also include tests for how well these
systems projected these statistics. 
OLIVER does not project these statistics, so these results will not
include them.

 

STOLEN BASES

RMSE

CHONE

6.56

PECOTA

6.61

ZIPS

7.53

MARCEL

8.05

 

CHONE does best with PECOTA close behind.  However, if we were to separate hitters into
groups with more than 10 steals and those with less than or equal to 10 steals,
we can see that the high variance of stolen bases among those players who steal
a lot of bases is driving this result. 
CHONE actually falls behind the other systems when it comes to
projecting stolen bases for slower players.

 

STOLEN BASES for
SB>10

RMSE

CHONE

10.89

PECOTA

11.16

ZIPS

13.47

MARCEL

14.82

 

STOLEN BASES for
SB<=10

RMSE

MARCEL

3.37

ZIPS

3.70

PECOTA

3.98

CHONE

4.10

 

Interestingly, the ranking is exactly the opposite for the slower
guys.  Perhaps the systems better are
projecting stolen bases are those that simply project higher stolen base
totals?  This group of players who
qualified by being projected by all systems and having at least 300 PA had an
average of 8.69 stolen bases.

 

SYSTEMS

Average SB total

CHONE

8.53

PECOTA

8.45

MARCEL

7.89

ZIPS

7.73

 

When it comes to projecting runs scores and runs batted in,
CHONE was definitely the best at projecting runs scored, but PECOTA was the
best at RBI.

 

RUNS SCORED

RMSE

CHONE

19.72

ZIPS

20.32

PECOTA

20.36

MARCEL

21.84

 

RBI

RMSE

PECOTA

19.79

ZIPS

19.84

CHONE

19.97

MARCEL

21.00

 

SUMMARY

 

There is clearly a lot of information here, and it may be
tough to know exactly what to take away from all of this.  In fact, I have way more results that I even
listed here, and there are probably a million tests that I did not even think of that could provide insight into how these systems work.  The main things to take away are:

 

–CHONE was the best at projecting most things.

–PECOTA was very close behind but had some systematic
biases, specifically for speedy players’ BABIPs, which ZIPS struggled with as
well.

–ZIPS is behind the other systems, except it does quite
well with projecting the three true outcomes for players over 35.

–CHONE does better with older players in general, since its
specialty is aging curves, but PECOTA does better at finding comparable players
for younger players for whom less data is available (unless they fall into the
speedster category).

–OLIVER clearly contends and even takes the lead at some
things–especially at projecting hitters with lower homerun totals and other
players significantly affected by park effects. 
However, OLIVER under-projects walks and strikeouts systematically and over-projects homeruns systematically, and
could probably be improved by adjusting how those outcomes are computed.

–None of the systems is terribly good at projecting
BABIP.  The other systems regress BABIP
to the mean far more than ZIPS does, but ZIPS does far worse at even projecting
BABIP and would probably improve if it simply did what those systems do instead.  However, I strongly believe that
projecting BABIP using batted ball data will help projection for this reason.

About these ads

3 Responses to Testing the Projection Systems’ Strengths and Weaknesses

  1. Greg Andrew says:

    Are any of these differences between systems statistically significant?
    Also, using RMSE as the only measurement of the accuracy of projections is problematic. It may be the single best measure, but using RMSE alone assumes the aim of all these systems is to get the best RMSE, and that may not be the case. Very cautious projection systems will get better RMSEs than systems which try harder to project shifts in performance level, but the latter may be considered better systems if they fulfill their aims.

  2. Dan Novick says:

    “Very cautious projection systems will get better RMSEs than systems which try harder to project shifts in performance level”
    While that may be true (I don’t know), it didn’t make a big difference here. Marcel didn’t go out and win every category despite being the most conservative of all the systems. Unless I’m misinterpreting your use of “cautious.”

  3. Matt Swartz says:

    Greg, that’s a good question about statistical significance. I’m somewhat embarrassed to say that I can’t recall how to do statistical significance for the difference between two different root mean square error tests? Does anybody know the formula I should even use for that? I’m having trouble even deriving what it would be.
    As far as RMSE, using correlations did not change the answers significantly, so I used RMSE. Over on Tom Tango’s blog, there is a discussion in the thread about this article and about what method to use, and the general consensus is RMSE. Some people think average absolute error might be the way to go, but they seem to think it’s better than correlations.
    As far as my personal thoughts on which to use, I see your general point about the goal being to find shifts in performance level and pretty much effectively determine who is underrated or overrated. I see that more as a goal for fantasy baseball– you don’t know want to know how good a guy is, you want to know if he’s better than other people think he is and should you draft him. For professional teams, the answer is a bit different. In that case, you’re trying to get a certain number of wins and approximate how many wins a player gets you, and that determines his worth. In that case, RMSE seems appropriate since it values correct valuations of players.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: