To Add or to Multiply, Part I of many.
April 18, 2007 15 Comments
In sabermetric analysis, there are several circumstances where we need to adjust a stat by some type of factor. Examples include minor league equivalencies (mle’s), park factors, and age adjustments. Say Conan hits 70 homeruns in Coors field. How many would he have hit in a normal park? How many would he have hit in a tough homerun park, like Seattle? A top prospect hits .375 in the Pacific Coast League. What’s that worth if he’s called up to the Majors?
The first time I saw MLE’s and park factors was in the Baseball Abstracts of Bill James. He used multiplicative factors, such as a hitter loses 18% of his value in jumping from AAA to the majors, or Wrigley Field inflates run scoring by 12%. As more data became available, we could go a lot further than looking at an end result such as runs. What’s the factor for batting average? For homeruns, doubles, and strikeouts?
It wasn’t until somewhat recently that I read of using additive factors. Additive factors were used in the projections published in The Hardball Times Season Preview, yet so far I haven’t seen a good explanation why I should use them after so many years of using multiplicative factors. At the same time, just because Bill James did it this way and so has everyone else who followed in his footsteps does not mean that it has to be done that way.
What is the right way to adjust? First, lets stop and think about what it implies. Lets say a ballpark is especially good for hitting doubles, inflating doubles by 20%. This could also be expressed as +5 doubles for every 500 balls in play. If an average player hits 25 doubles with 500 BIP, then in this park he’ll hit 30. It doesn’t matter, for the average player, whether you add or multiply.
Now, take a high double hitter, he normally hits 40. If you multiply, he’s at 48, if you add, he’s at 45. By multiplying, the hitters who are already good at something benefit most from a friendly park. By adding, all players gain the same benefit.
What is the correct method? I really don’t know. My answer is to use whatever best models reality. I will start looking at pitcher strikeouts and park factors. At this point, I can’t skip ahead and tell you to add or multiply, I haven’t done the work yet, but its possible that both will have their uses depending on the situation.
The first look is at the 2006 pitching staffs of the A’s and Mariners. The A’s had a one-year strikeout factor of 0.91 according to the 2007 Bill James Handbook. The Mariners on the other hand had a factor of 1.11. Looking at only Mariner pitchers, and using a matched at bat method instead of simply dividing team home strikeout rate by road strikeout rate, I get an even larger adjustment, 1.185. Next, I split the Mariners into two groups, the high strikeout pitchers and low strikeout pitchers. The low strikeout pitchers struck out 26% more batters at home than they did on the road. The high strikeout pitchers only struck out 14% more. The low strikeout pitchers added 3.1 strikeouts per 100 at bats, and the high strikeout pitchers added an almost identical 3.3 strikeouts per 100 AB. Score one for additive factors!
I repeated the same exercise for the A’s, who played in a park where strikeouts were scarcer in 2006. In total, A’s pitchers struck out 8% fewer hitters at home than on the road. Now things get a little tricky. The percentage of strikeouts lost was 3.2% for the high strikeout pitchers, and 16.7% for the low strikeout pitchers. Per 100 AB, its 0.7 strikeouts lost for the high-K guys, and 2.5 for the low-K guys. This is not as obvious as for the Mariners pitchers, but a multiplicative factor will punish the high-K guys more, when it is the low strikeout pitchers that are affected most. An additive factor, while not fitting perfectly, is a better fit here as well.
Its only two teams and two years, and we’ll have to look at more data before this is settled, but for pitcher strikeout park factors, additive factors look like the way to go.