The measure of a man, part III
March 9, 2009 2 Comments
For those who have been following this series, my goal has been to develop a small number of orthogonal (non-correlated) measures that will adequately encapsulate a batter’s offensive talents. In part one, I explained how I found four such factors, and howthey are derived through a logical flowchart and a factor analytic approach. In part two, I showed that the four factors are actually useful in predicting player typologies. In part three, let’s look at whether they are stable and how they change over time.
My hope was that the factors that I created would be stable over time. I took the original ten factors (strike zone sensitivity, response bias, contact rate, LD/FB/GB percentages, 0 and 1 strike fouls per PA, 2 strike fouls per 2 strike PA, speed score, and power score) from 2003-2008 and put them in a factor analysis. Same four factors shook out with basically the same loadings.
Getting consistency measures should be easy enough. I did my usual AR(1) intraclass correlation over four years worth of data (2005-2008). Things were going great (Ichiro-Howard: .77, contact: .79, risk: .79) until I got to solid contact (.40?) I specifically built these out of things that showed good reliability. How did .40 happen? The two variables that load heavily on “solid contact” are LD rate (ICC = .31) and power score (ICC = .55). I had previously found them to be more reliable than that. There are some well-documented problems with how Retrosheet classifies line drives, which may be playing a part here, but I’m not sure what happened.
Now, .40 isn’t a horrible ICC, but it isn’t a great one either. Let’s assume that it’s a true finding. It means that making good contact over the course of a year is more noise than signal. When you think of it, we are dealing with trying to hit a small ball traveling at 85-90 mph (sometimes more) with a stick that’s only a couple inches in circumference, and at that, the stick itself is traveling at a high rate of speed. I suppose that the angle at which the ball comes off the bat and where it goes is bound to have some element of chance in it.
Onward to the aging patterns. Actually, they proceed in much the way that you might imagine. For example, younger players score higher on risk (swing more, have more foul balls for strike 1 and 2, and make less contact). Older players are more likely to be on the slow flyball hitter end of the Ichiro-Howard spectrum and to be better contact hitter. Solid contact vibrated all over the place with no pattern. I looked at them using a simple mean graph by age at first (like the one below for contact), but that’s a flawed method.
Any time you do aging studies, there are a bunch of confounding variables to consider. A player who is still playing at 36 (and surely collecting free agent level dollars), is a different sort of player than the player who is not playing at 36. He’s probably a pretty good player to begin with. How to model this growth curve while at the same time controlling for the fact that our survivors at 36 probably had some pretty good skills to begin with? Through a process called mixed-linear modeling (MLM). Actually, intra-class correlation is one part of MLM. When I do ICC, I’m finding out how much of the variance is accounted for by the player’s own growth curve. What that actually is is the control mechanism for within subject effects in MLM.
If you control for the within-subjects effects (i.e., the batter himself), you can set up a regression where what you get is the average effect of being a certain age (gory details: enter age as a factor, set to fixed effect, set intercept to random effect). In theory, I could do this with any stat. What it does is give the average effect of being X years old (I used April 1st… roughly Opening Day… age for this one) on that particular stat.
Let’s take a look at the coefficients for each age (I only went from 23 to 39). Remember that all four factors have a mean of zero and a standard deviation of 1.
You have to read those numbers relative to one another. The average effect of being 27 years old is to be .1688 points (remember mean = 0, SD = 1) below whatever your underlying skill is, which we’ve already controlled for. The effect of going from 27 to 28 (Happy Birthday!) is to go from -.1688 to -.0742, or roughly .09 points. That’s an average effect after controlling for talent. Say that again to yourselves. It’s an average effect of age after controlling for talent. So we can more accurately describe the aging process on these (or any) skills using this method. It’s a nice little way around the selection bias problem, although the problem is that it’s a regression-based method and just shoots through the middle. It’s possible that different types of players age in different ways and we would have no way to know that using this method.
The thing about three of the four skills (Ichiro-Howard, contact, and risk) is that the aging curves proceed in a fairly linear fashion. As players age, they become more fly-ball based hitters, who make better contact, and swing less. Solid contact bobbles all over the place, and seems to be a little bit more random than the others. In part II, we saw that there were some benefits to being a flyball hitter (more HR) and some to being a groundball hitter (more singles). Contact was a double edged sword. If you make contact, you probably won’t strike out as much, but you purchase that at the cost of some power. Risk-taking is always a game best played in moderation. Maybe the reason that players “peak” around their mid-to-late 20’s is that this is when the skills reach the point where they are best balanced.
The aging profile for contact (see above) shows that the “age” effects are at their smallest absolute value (so it’s really the player’s true talent shining through, between 28-31. For Ichiro-Howard, it’s 27-29. The age effects on risk are pretty low to begin with (except for really young and really “old” players), but they are at their lowest from 27-29. Presumably, major leaguers are scouted, groomed, and brought up because some scout believed that they have good skills. The late 20s are when players have those skills the least tainted by age.
So we have four orthogonal factors which are well constructed and show pretty good evidence of reliability and construct validity. Next time, we’ll discuss how we can take these factors and produce similarity scores that actually make sense.