# Looking for park effects that make sense

Baseball.  A game where the parks themselves are charcters in the grand drama of the game.  (Did I really just write that?)  Anyone who’s ever seen a box score from a game played at Coors Field, taken in a day game at Wrigley Field when the wind is blowing out, or seen a picture of Fenway Park can understand that.  If there’s something that always complicates life for a Sabermetrician, it’s the problem of park effects.

The very first park effects naively looked at just a one-year period and adjusted players’ statistics (all of them) according to one number.  These one-number park effects were mostly interested in how the park affected run-scoring, which after all is the point of the game.  But something always struck me about these park effects.  They always seemed rather inconsistent, even when they moved to taking three years worth of data.  I can understand that there are plenty of things that can affect run scoring in a park.  The summer might be hot.  The wind might have been blowing out a lot.  But I think there’s something else going on here.  In 2000, the Metrodome’s 3-year park factors were 104/105 (hitter/pitcher).  In 2008, they were 93/93.  In the space of 8 years, the Metrodome went from a hitter’s park to a pitcher’s park.  The key word in that sentence is “dome.”  Something’s wrong.

Recently, those who look at park factors have been turning toward a more component-based approach.  How does the park affect double rates?  What about strikeouts?  That’s better.  In fact, some of StatSpeak’s own Brian Cartwright’s work on whether we should adjust park effects for different types of hitters is downright awesome.

As someone with a soft spot for looking at the reliability of different statistics, I have a slightly different question to ask.  We know that while players change from year to year, ballparks are structures made of concrete and steel and remain fixed to a specific geographic location.  Park effects, in theory at least, should be very very consistent.  It bugs me that there seems to be so much variation.  What to do?

I went about creating my own park effects using (what else?) Retrosheet data from 1993-1999 and 2003-2008 (the 2000-2002 data are a little lacking in the batted ball type department).  For the moment, I’m looking at two stats, one which should be heavily influenced by park (HR/FB… no I didn’t code for left field vs. right field…), and one that shouldn’t (K/PA), at least in theory.  I calculated each player’s (both hitter and pitcher) HR/FB and K/PA rate when playing on the road.  (More on this in a minute.)  The reason for road stats is simple.  On the road, a player plays in a bunch of different parks and in doing so, over the aggregate, we can say that he played in a league average park.  Because of the un-balanced schedule and inter-league play, it’s harder to make that argument nowadays, but it’s a decent stab at estimating the true talent level without that estimate being overly influeneced by his home park.

The other objection to using road stats is that players are much better when at home than when on the road.  I suppose that there is some adjustment that I might (and eventually will) make to correct for this.  Right now, I’m not so much interested in the magnitude of the park effects, but that they are reliable.  The corrections that I might apply likely wouldn’t change the reliability estimates.  (The objection that I have, and that’s inherent in park effects, is that we’re taking already unreliable stats and using only half as much data as we have… but thankfully, stadiums see several thousand plate appearances over the course of a year.  I can inflate some of those concerns away via volume.)

I used the road stats to generate what the expected probability of the outcome of interest by using the odds ratio method.  A quick review for those who aren’t familiar, if I want to find out what the likelihood of a strikeout happening in this PA is, I can model that through changing the probabilities to odds ratios (OR = p / (1 -p) ), and using the formula:

Exp OR = pitcher OR * batter OR / league OR.
(you can then turn the Exp OR into an expected probability rather easily.)

Then, I can see, over a certain number of plate appearances, how often the event actually happened vs. how often it should have happened.  That ratio of the two is the park effect.  (Methodological note: I only included PAs which included a pitcher who faced at least 250 batters in the year in question against a batter who had 250 or more PA in that season.)

Now, let’s use split-half reliability to see whether those park effects are stable.  I took a sample of 1000 PA’s within a park, and split them by evens and odds in sequence, so as t make two samples of 500 PAs in each park.  Eventually, if I took a sample of a billion PA in the park, and then a second sample of a billion PA, they’d probably match up almost perfectly and have a correlation near 1.0 (which is perfect correlation).  But what about at 500 PA?  Is that enough to get them to correlate that well?  What about 1000?  2000?  5000?

And what is “good enough?”  When I deal with player stats, I look for a correlation of .70, which is pretty standard in psychology.  Baseball players, and all people, change from year to year, so a correlation of .70 or above means that I’ve explained more than half the variance (.70 * .70 = .49 R-squared).  Not bad for something that we know is changing.  But ballparks don’t mature or have bad days or go through divorces or pull hamstrings.  So, I made the completely arbitrary decision to look for a correlation of .90 (so around at least 80% of the variance accounted for).  That’s going to make for a tighter standard error of the estimation around the park factor estimate.

I first looked at HR/FB, and got the following split-half reliabilities at each number of fly balls (note: not PAs, but FBs).

500 FB: .703
1000 FB: .711
2000 FB: .864
3000 FB: .876
4000 FB: .864

Past that, the sample became unsupportable.  The weird dip between 3000 and 4000 is due to the number of stadiums in my sample shrinking.  There are fewer stadiums that have hosted two samples of 4000 fly balls (aka 8000 flyballs, and with the caveat that they need be a confrontation between a batter/pitcher with 250 PA each) than there have been 2 x 3000 fly ball stadiums.  The average stadium sees about 1800 flyballs per year, so 4000 FB is roughly 2.5 years.  If we extrapolate out a bit, three years worth of data is probably about right to get to .90.

To make sure that these park effects were passing the smell test, I looked at the park factors to see which stadiums seemed to make flyballs leave the yard.  SkyDome (or whatever they’re calling it now), The Ballpark at Arlington (the rare occurence where a corporate sponsorship actually made a ballpark’s name less stupid), Coors, Dodger Stadium, and the Kingdome topped the list.  The low end was brought up by Shea, PetCo… OK, who let PetCo sponsor a baseball stadium?, AT&T Park, Old Tiger Stadium, and old County Stadium in Milwaukee.  Not an un-reasonable list.

Now, on to strikeouts per PA.

1000 PA: .608
2000 PA: .785
4000 PA: .757
6000 PA: .813
8000 PA: .845
10000 PA: .847
12000 PA: .869
14000 PA: .905
16000 PA: .898
18000 PA: .935

Interesting.  The average park sees a little more than 6000 PA’s per year, so again, we’re looking at about 2.5 years worth of data as the point when strikeout numbers are stable enough.  What surprised me is that the effect for strikeouts actually stablizes more quickly than the effect for HR/FB.  That’s not to say that the effect for strikeouts is bigger, just that it’s more robust.

Are there stats that don’t have much in the way of park effects?  I looked at hit-by-pitch levels (per PA) and the highest I could get that to after 18000 PA was .220.  (Actually, at one point it hit .355, which is nothing worth noting.)  So, not everything is affected by park.

What have we learned?  I’ve really only looked at two stats, but it looks like 3 years is about right for those component based park factors to stablize.  The other thing to think about is whether we need to regress park effects to the mean based on sample size the same way that we regress player performances to the mean.