The measure of a man, Part II
February 25, 2009 Leave a comment
In part one, I created a four-factor structure to describe a player’s offensive abilities (rather than his performance). So… do these four factors tell us anything interesting about a player’s performance? After all, it’s nice to have little mathematical abstractions, but what’s the practical value, you may ask. The four factors were the Ichiro (grounders/speed) to Ryan Howard (flyballs/power) continuum, contact skills, risk-taking, and solid contact. Do they predict anything useful?
To answer that question, I looked at it from a few different angles. First, I calculated the factor scores for everyone who had more than 100 PA in 2008. Then, I calculated some basic performance measures (1b%, xbh%, hr%, k%, bb%, hr/fb, obp, slg, ops). Then, I started with some basic correlations to see what was related to what.
Ichiro-Howard (higher numbers mean more Ichiro than Howard) correlates rather well with single rate (r = .495) and hr rate (r = -.636). Makes sense that slappy hitters hit singles and power hitters hit homeruns. Power hitters were also more likely to hit their flyballs out of the park (r = -.485) and to have a higher SLG (r = -.507), although the effect for OBP was not as pronounced (r = -.203).
Contact skills are, no shock, correlated with not striking out (r = -.822!!!), although they didn’t correlate well with things like OBP (r = .208) or SLG (r = -.034). It’s not that contact hitters are better or worse at getting on base, just at not striking out. However, they are more likely to be singles hitters (r = .516).
Risk taking wasn’t correlated with much of anything. The biggest correlation I got was with walk rate (r = -.332, which was the highest correlation found for walk rate). More on risk a little later.
Solid contact, however, was a pretty good measure of xbh% (r = .389, may not seem like much, but it was the best correlate by far!), but was an even better predictor of OBP (r = .514) and SLG (r = .477).
There was one other variable that I created, that unto itself was not correlated with any of the outcome measures. I squared the values for the Ichiro-Howard contiuum. It then becomes a measure of extremism in approach. A player who is a strict slap and run guy (like Ichiro) who has a score of 2.07 (scale mean = 0, SD = 1), is an extreme case and so when we square his number, it will be rather big. However, a guy like Joe Crede (-2.02) who is extreme in the other direction, A-Rod, is actually a very well-balanced player between the two ends of the spectrum (.024), so his squared number will be rather small.
Now, the thing about baseball skills is that skills build off one another. If you can hit the ball a mile, but can’t make contact, you might as well not have the power. We need moderator analyses, to see whether the interaction of two skills predicts to anything interesting. A quick math review: to test for a moderator set up a linear regression, the two variables that you think moderate one another, plus the two variables multiplied by one another. If the interaction term is significant, you have a moderator. Then, it’s a matter of figuring out what moderates what and how. There are different types of moderator effects. What a moderator means though is that one variable changes (in some way) the effect that the other has.
Contact skills proved to be a very common moderator in these analyses, particularly moderating the Ichiro-Howard continuum, although the effects weren’t very neat. For example, players who were more to the Ichiro end of the continuum, if you raise their contact skills, it doesn’t move their walk rate very much. But for guys on the Howard end, a jump in contact skills means a lower walk rate. The effect is more pronounced for the Ichiro-squared numbers. The best walk numbers are for those who are balanced in their approach (not too close to either Ichiro or Howard), but don’t make a lot of contact. The worst are the extreme guys who make a lot of contact.
Contact skills also moderate the effects of extra base hits and HR/FB, depending on what sort of hitter you are. If you’re a Howard, an increase in contact percentage will drive down your HR/FB rate, but will drive up your extra base hit rate. There are some guys who are built to hit doubles on their fly balls, not HR. Because they have high levels contact skills, they won’t strike out as much in general. Consider this list: D. Wright, Morneau, Ibanez, McLouth, Pujols, Millar, Kinsler, McCann, Carlos Lee, and Lowell. They are guys who are in the top 25% of the league on both the Howard end of the I-H continuum and in contact skills (top ten in plate appearances on that list of 20.) Outside of Pujols, who is just amazing, they all have reputations as guys who are good, but not great power hitters, but who are good for 25 HR over a season… and 40 doubles. And they all have strikeout rates at 15% or below. Not a bad profile to have.
One other interesting effect of contact skills. If you’re someone who is in the middle of the Ichiro-Howard continuum, as contact skills rise, you see a slight bump up in OBP, although it’s pretty high to begin with (around .330). But, if you’re at one of the extreme ends (either a major GB hitter or a major FB hitter), if you have limited contact skills, your OBP is likely to be south of .320. If you have good contact skills, it jumps up past .340 on average. So, you can be a groundball hitter, you just have to be able to make contact, and be happy with a lot of singles. However, on the flip side, SLG has a nearly opposite pattern. Guys with high contact skills are generally not going to be huge SLG guys. However, guys who are in the middle of the Ichiro-Howard continuum plus low contact skills (apparently trading contact for power) see a huge jump in their SLG. Guys who are extreme (either in the GB or FB direction) don’t get that bump from sacrificing contact for power. So, if you want to be an OBP hitter, be someone who is extreme in his approach with good contact skills. If you want SLG, be someone who is middle of the road in his approach with bad contact skills. If you want both, be Albert Pujols.
Then, there’s the other two variables, risk and solid contact, which seem to moderate one another on a couple of occasions as well. Players who take fewer risks generally strike out less than those who take more. But those who make solid contact when they swing strike out even less than that, but a couple of percentage points. Guys who wait back and don’t swing a lot, but make good contact when they do are less likely to strike out. It works for extra base hits too. If you don’t have much solid contact power, you won’t have many XBH’s. If you do, you’ll have more XBH’s if you’re a low risk-taker rather than a high-risk taker. There’s something to be said for waiting back for your pitch. The guys who swing all over the place are probably more fun to watch. But the guys who are patient are the ones who will get the benefit from their skills.
So, it looks like these skills do interact with one another to predict some useful player typologies. Can they combine to actually predict outcomes? If I know a player’s skill set, how well can that be used to predict his actual performance. I took all four factors and all six interaction terms and threw them into a stepwise regression (just to keep things clean) to see what fell out and regressed the following outcomes on those ten variables. I’ve listed the dependent, the significant predictors in the final model (in order) and the final R-squared for the model.
- K rate: contact, risk, Ichiro-Howard; .747
- BB rate: risk, Ichiro-Howard, solid contact, contact, contact x risk; .209
- HR/FB: Ichiro-Howard, contact, solid contact, risk, IH x risk, IH x contact; .497
- HR rate: Ichiro-Howard, contact, risk, solid contact, IH x risk, IH x contact, .580
- 1B rate: contact, Ichiro-Howard, solid contact; .565
- XBH rate: solid contact, Ichiro-Howard, IH x contact, IH x risk; .248
- OBP: solid contact, risk, contact skills, Ichiro-Howard, IH x contact, contact x risk; .413
- SLG: Ichiro-Howard, solid contact, risk; .499
For a bunch of these outcomes, I’m picking up the majority of the variance with my four factors (and their interactions). Remember, that’s R-squared, so even that lousy little .209 is really a correlation of .46. Extra base hits and walks, I’m at a loss to really explain for now. Seems that sometimes the ball just finds a hole in the outfield. However, walk rate is a pretty consistent stat from year to year and in a split-half framework. Perhaps the ability to draw walks is its own animal, not related to anything presented here?
In Part III, we’ll look at how consistent these metrics are, and how players age with respect to each of them.