Power scores (or at least my attempt)
May 19, 2008 6 Comments
A while ago, I took on the task of building a better speed score. Bill James had come up with his formula some time ago, and for what it’s worth, I found that his formula (warning: PDF) was pretty good. My formula was much more stable over time… but it was also a lot harder to calculate. So, while we can get a single number to describe speed, something that has a bearing on several different events in a game (stolen bases, triples, staying out of a double play), I’ve never seen an attempt made at a “power score.” I figured I might as well give it a shot.
First, we identify events in a game that might involve “power.” For example, power would clearly be involved in home run hitting, but what else might be involved? I made a list of stats and rates that might just be involved.
- Home runs per fly ball. It’s hard to hit a home run on a ground ball. But, a player with power would likely have the power to put a fly ball over the fence.
- Fly balls fielded by outfielders rather than infielders (whether caught on the fly or not). It seems sensible that even if a fly ball doesn’t leave the park, it is still the mark of a more powerful hitter that he would put more balls further away (the outfield) from him than close to him (the infield). (Formula: OF fly balls / total fly balls and popups)
- Doubles and triples per ball in play. A grounder or line drive could end up going for a double or triple, but certainly, they are more likely on a fly ball that hits the wall. In either case, the ball was probably hit pretty hard.
- Line drive rate. Part of hitting for power is making good solid contact, and line drives are the sign of good solid contact.
- Ground balls fielded by outfielders rather than infielders. Again, a ground ball that goes through the infield is more likely to have been hit harder (or perhaps just placed better?) than if it was fielded by an infielder. Again, it doesn’t matter if it went for a hit (the shortstop got to it, but couldn’t make the throw), just where it landed.
- ISO. This is supposed to be a measure of “isolated power.” The formula is SLG-AVG. Let’s see what happens.
- BABIP. Not often that we get to talk about BABIP from the batter’s perspective. But again, balls in play that go for hits mean that fielders had a hard time getting to them. What’s one way to give a fielder a hard time getting to the ball? Hit it really fast or really hard.
Like my methodology for calculating speed scores, I was dealing with a lot of probability numbers (with the exception of ISO). Probability distributions are notoriously not normal, so I applied a normality transformation by taking the natural log of the odds ratio. I restricted myself to players who had at least 100 PA in the season in question, and I had a database stretching from 2000-2007. I converted all natural logs of the odds ratio to Z-scores based on the distribution present in the year in question (to get everything into the same basic range of scale). I then subjected these Z-scores to an exploratory factor analysis, with a Varimax rotation, to see which of these variables hung together. I saved factors with an Eigenvalue over 1.00. If you have no idea what I just said, just trust me on this one.
The results were a little bit surprising. I got two factors (gory detail: picked up 59.7% of the variance present.) So far, so good. HR/FB, XBH, and ISO hung together, as might be expected. Outfield flies also was part of this factor, although it loaded negatively. So, we would expect someone who hits a lot of homeruns and doubles to hit fewer fly balls to the outfield (or more to the point, more infield flies.) The other factor that emerged was a combination of BABIP, ground balls that go through to the OF, and line drive rate. (gory detail: There was very little in the way of cross-loading factors.)
So, home runs generally are accompanied by other extra base hits, and that generally pushes the ISO up (not a huge surprise that those would all hang together). But, something that speaks of a power hitter is actually (comparatively) a lot of infield pop ups. We already know that power hitters are given to striking out and that they hit a lot of foul balls, but it also looks like they have a propensity to hit infield pop ups. Seems that trying to hit big fly balls has plenty of risks. Swinging really hard is bought at a price of lowered plate coverage, but also it looks like the ability to control the bat angle goes. Get the horizontal angle wrong, hit it foul. Get the vertical (bat impact angle) wrong, hit a harmless popup. Get it all right, fireworks.
To check to see whether my two new scales were consistent over time, I looked at their intraclass correlation over four years (2004-2007). The first factor (call it “big fly power”) had an ICC of .740 indicating excellent consistency over time. The second factor (call it “solid contact power”) had an ICC of only .380, which really isn’t all that good (not horrible, but not great). Since two of the components are getting the ball through the infield on a ground ball and getting a hit when the ball’s in play (we might, thus, call this one “hitting for average”), there’s something to be said for the fact that this skill is only moderately consistent from year to year. What might stand in the way of those two skills? The defense. Trying live off of hanging back and making solid contact has its own risks. If you hit it where they ain’t, bully for you. If the defense can cover a lot of ground, you’re going to have some issues. There’s no defending a fly ball that either hits the wall or goes off it.
Now, why go to all this trouble (and what the heck is exploratory factor analysis?) to figure out power numbers. Can’t one use ISO or HR/FB or something like that on their own? Sure, ISO and HR/FB correlate well with the “big fly” factor (.931 and .872, respectively, meaning that they parallel each other very closely) But they are not as consistent over the years for batters (ICC’s of .648 and .675, still quite good, but not as good as the total factor). Here’s the beauty of exploratory factor analysis and scale construction. Put a few things together that are correlated anyway and the possible random variations to the extreme in one can be balanced out the others and make for a more stable whole. If you want a good number that will stay more consistent over the years, use my power number. If you want a quick and dirty number that’s really easy to get, go with ISO.
While I was in the neighborhood though, I ran a correlation matrix and found that ISO and BABIP are actually un-correlated with one another. Looks like the old scouting adage about “hit for power” and “hit for average” being two separate tools is accurate. One can be both or neither (well, if you’re in MLB, you’re at least one), but one doesn’t tell you much about the other.
For those interested, I’ve posted the 2007 list here, sorted as always by Retrosheet ID.