Does PECOTA overestimate the batting averages for fast players?

Normal
0

false
false
false

MicrosoftInternetExplorer4

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:””;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:”Times New Roman”;
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}

There are a number of projection systems out there for
predicting player performance.  All of
them are pretty good.  They all make
claims of superiority from time to time, but the clear consensus is that there
is no consensus.  In some ways, PECOTA
could be considered the best, but CHONE, ZiPS, Marcel, and many others have
their strengths.  As I was looking
through the projections for this year, I also wondered what the systems’
weaknesses were.  One thing that I
noticed was how high some of the batting averages were for speedy baseball
players for the PECOTA system.  This
year, PECOTA projects batting averages for Jose Reyes, Jimmy Rollins, and
Hanley Ramirez that are more than ten points higher ZiPS and CHONE.

 

I decided to look at this in a more scientific way.  I went through the PECOTA system’s projections
for 2006-2008 for the 832 players who managed 300 PA during those years.  I calculated how far the players’ batting
averages exceeded their PECOTA projections. 
I wanted to compare this to their Speed Score as listed by Baseball
Prospectus according to each of their projections.  I figured that if I simply ran this
regression without a control for PECOTA overestimating a player’s skill, there
would be a bias there (players whose speed PECOTA overestimated would have averages below their PECOTA projection).  So I developed a
control of the difference of their actual stolen base total and PECOTA’s stolen
base estimate.  This should allow me to
isolate whether PECOTA overestimates batting averages for speedsters,
controlling for whether they accurately estimate the players’ speeds.  Here are the results:

 

Source

SS

df

MS

 

Obs

832

Model

0.02476

4

0.00619

 

F(4,827)

9.53

Residual

0.537231

827

0.00065

 

Prob>F

0

Total

0.56199

831

0.000676

 

R-sq

0.0441

 

 

 

 

 

Adj R-sq

0.0394

 

 

 

 

 

RMSE

0.02549

avg-PECavg

Coef.

Std.
Err.

t

P>|t|

95%Cimin

95%Cimax

sb-PECsb

0.000597

0.000131

4.57

0

0.00034

0.000854

pspdtop4th

-0.00364

0.002009

-1.81

0.07

-0.00758

0.000302

yr06

0.005326

0.002177

2.45

0.015

0.001053

0.0096

yr07

-0.0028

0.002154

-1.3

0.195

-0.00702

0.001433

_cons

0.002151

0.001604

1.34

0.18

-0.001

0.005299

 

(avg-PECavg): average minus PECOTA projected estimate of
average

(sb-PECsb): stolen bases minus PECOTA projected estimate of
stolen bases

(pspdtop4th): indicator function equal to 1 if the speed score
were in the top quarter of speed scores in that year (speed scores are measured
on a different scale for each year)

(yr06, yr07): indicator functions equal to 1 if the year was
2006 or 2007, to control for the measurement bias by year.

 

This is weakly statistically significant, and indicates
PECOTA does in fact overrate speedsters.

 

I did specifically pick the regression that looked best to
show, but for the sake of completeness, here is the regression with the number
of standard deviations above the mean their speed score was denoted “pspdz” as
a regressor.  This is less significant,
since it seems that PECOTA does not do a better job of projecting slow players
than players with average speed.

 

Source

SS

df

MS

 

Obs

832

Model

0.024347

4

0.006087

 

F(4,827)

9.36

Residual

0.537643

827

0.00065

 

Prob>F

0

Total

0.56199

831

0.000676

 

R-sq

0.0433

 

 

 

 

 

Adj R-sq

0.0387

 

 

 

 

 

RMSE

0.0255

avg-PECavg

Coef.

Std.
Err.

t

P>|t|

95%Cimax

95%Cimax

sb-PECsb

0.000599

0.000131

4.57

0

0.000342

0.000856

pspdz

-0.00145

0.000891

-1.63

0.104

-0.0032

0.000299

yr06

0.005156

0.002177

2.37

0.018

0.000883

0.009428

yr07

-0.00279

0.002155

-1.29

0.196

-0.00702

0.001443

_cons

0.001231

0.001521

0.81

0.418

-0.00175

0.004217

 

Here, “pspdz” is not quite significant, but is not far off.  Since the distribution of “pspdz” (the number
of standard deviations the speed score is above the mean for that year) is not
distributed the same for each year, this is likely not a perfect measurement
and perhaps this is why.

 

Clearly, model specification is an issue, but I am afraid to
distribute my data since PECOTA projections are proprietary (and I assume historical ones are as well).  For the sake of
transparency, however, I will run regressions that people request by post or email me, with alternative models
using the PECOTA data.

 

Moving on to 2009, I decided to compare how the top 26
base-stealers as projected by PECOTA (speed score is not listed for 2009 PECOTA
projections) looked compared to CHONE and ZiPS projections.  I dropped the players who did not have any
significant amount of major league experience. 
Then I did the same thing for the top 26 homerun hitters as projected by
PECOTA, again comparing those to the CHONE, ZiPS, and Marcel projections.  Sure enough, PECOTA projected the batting
averages for the speedy players higher than CHONE, ZiPS and Marcel, but not for
the homerun hitters. 

 

I would paste in the table here, but again, since PECOTA’s projections are proprietary, I will only summarize the results.

 

For the 26 speedsters, PECOTA was the highest of the four
systems for 14 of them.  It was the
second highest for 2 of them, third highest for 2 of them, and the lowest for 8
of them.  For the 26 sluggers, PECOTA was
the highest for 7, tied for the highest for 4 of them, second highest for 1 of
them, third highest for 5 of them, and the lowest for 9 of them.  It estimated a batting average ten points
higher than the average of CHONE, ZiPS, and Marcel for 8 speedsters, but for
only 5 sluggers (2 of whom were Beltran and Hanley Ramirez, also speedsters).

 

The 8 speedsters that it was the highest for were: Jose
Reyes, Jimmy Rollins, Hanley Ramirez, Michael Bourn, Carlos Gomez, Brandon
Phillips, Rickie Weeks, and Nate McLouth. 
It was also pretty high on Willy Taveras, Shane Victorino, Juan Pierre,
and Corey Hart.

 

I would be cautious about trusting PECOTA on these
guys.  It does seem that PECOTA does
indeed overestimate these hitters by a bit. 
By the regression estimate, it looks like fast players may get an
exaggerated batting average boost of about 4 points.  I would guess that each of the projection
systems has their weaknesses on certain players.  If it were possible to determine which types
of hitters were better projected by different systems, I think that would be
extremely useful to know.

About these ads

5 Responses to Does PECOTA overestimate the batting averages for fast players?

  1. JR Ewing says:

    Very interesting analysis and probably something to keep in mind when using PECOTA projections.
    However, being only 4 points higher is probably well within expected precision of projections. For instance a .300 hitter that has 550 at bats and gets 3 extra hits over the entire season will hit .305 (168 hits instead of 165).

  2. Matt Swartz says:

    Thanks, and that is a good point. The thing is that it’s 4 points higher across the board, and that is weakly statistical significant. I guess it’s equivalent to saying Marcel will project the .300 hitter to hit about .300 with a confidence interval centered around that, and PECOTA will project the .300 hitter to hit about .305 with a confidence interval centered around that so even though the confidence intervals overlap, one is better.
    Also, I would guess that it’s probably not 4 points for all speedy players. Maybe it’s 10 points for 40% of the speedy players, and I just haven’t played around with the data enough to figure out who.
    You’re right that 4 points doesn’t sound like much though on its own.

  3. Pizza Cutter says:

    Aren’t we falling into the old government spending aphorism that a billion here and a billion there and soon you’re talking about real money? The issue here isn’t precision (all projections have to deal with imprecision), it’s bias in the measure (or the projection, I suppose). Good work uncovering it.

  4. Jeff K. says:

    If my posts over on Primer aren’t clear on the point, I wholeheartedly concur with PC’s last sentence. My kvetching is well-intentioned.

  5. Matt Swartz says:

    Jeff, I appreciate the criticism. I’m working on a follow up article, and it helps me do it. I know I’m being hard on PECOTA, because it is a very good system, but I do think that systematic biases are a huge deal. I’m on the process of checking them all though.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: