Managers make their players better (or is it the other way around?)
June 9, 2009 5 Comments
Suppose that a team were constructed that had a lineup featuring the best player at every position (Pujols, Utley, A-Rod*, etc.) The pitching staff consisted of Roy Halladay, Tim Lincecum, and Pedro Martinez, just arrived from 1999 in a magic time machine. Mariano Rivera was available for the ninth inning. Now, suppose that you (yes, you!) were hired to sit among these men as the manager. Over a 162-game season, your team would end up something like 135-27. You’d win a World Series. And you’d still be an amateur hack.
Or maybe not. Maybe on the side, you tinker a bit with Pedro’s 1999 delivery and teach him to throw filthy side-armed sliders, under-hand rise-balls, and maybe help him out a little with that lagging fastball velocity. You also teach Pujols how to hit the ball farther (he apparently needed help with that). While you’re in the neighborhood, you teach Frank Thomas, your team’s designated pinch runner, a few tricks and he subsequently steals 100 bases and tries out for the US 4x100m Olympic relay team. And makes it. You even show Darrin Erstad (we need some HEART on this team) how to get a few more yards on his punts. (I miss FJM.) The problem is that no one really believes that you had anything to do with the World Series win. After all, you just inhereited a bunch of superstars and went along for the ride.
The manager is always the first one to be blamed when the team is having a bad spell. If you’re angry about your team’s performance and need to vent, “Fire the manager!” makes for a rather nice refrain, even if that manager is working miracles with what he has on hand… it’s just he doesn’t have much on hand. On the flip side, a manager who skippered a World Series winning side can always say that he is a “proven winner” even if in reality he had nothing to do with it or the team succeeded in spite of him. The problem is that in evaluating the manager’s performance, there’s a giant confound. Managers can only use the players that are available to them. You can’t make a silk purse out of a sow’s ear. Just ask Manny Acta.
A manager, in my estimation, has three jobs. He is responsible for game strategy decisions (should we hit and run here?), he is the head coach in charge of improving the players’ performance, and he’s the chief spokesman for the team both internally (keeping peace in the clubhouse) and externally (dealing with the media). Most people looking to rate managers have looked at the question from the first perspective. I’ve even seen a few attempts to develop rating systems. The third really doesn’t lend itself to measurement. But what about the second job of a manager? Major leaguers are gifted athletes, but there’s always something that they can learn. Otherwise, why have coaches? True, the manager doesn’t actually do all the coaching/teaching, but he is the man who is at the top and who hires the guys that do the teaching. How can we tell if a manager is a good teacher/coach?
Why not look at managers the same way that we look at classroom teachers? If you have kids, they probably took a state achievement test a few months ago. (If you don’t, you might remember taking the dreaded “Iowa Test” or the CATs at some point when you were in school.) These tests are rather controversial, as is everything in education policy. Most of the controversy has been around whether teachers are “teaching to the test” rather than encouraging creativity and innovation. But, there’s another hidden aspect. The No Child Left Behind Act called for such annual evaluations as a way to test whether schools were “failing.” Then, there is the controversy of merit pay for teachers. Teachers who have low-scoring kids must be poor teachers, right? They should be paid less than those whose kids score better?
So, how to tell, in the parlance of former President G-Dubs, “Is our children learning?” And while we’re at it, which teachers and schools are doing good work. Statistically speaking, that answer calls for a rather high-level statistical tool, hierarchical linear modeling (HLM). (Note: technically, what I’m going to do is a related procedure, MLM or mixed linear modeling, but it’s the same basic idea.) Here’s the intuitive introduction to HLM. Imagine what factors influence a kid’s score on a test like the CATs.
- First off, the kid may be smart or… what’s the polite term?… still developing proficiency (individual level variance).
- We can also track individual variance over time. If a kid gets a 75 out of 100 one year, we don’t expect a 30 the next year, assuming that the test has good reliability. We can correct regression models for that. It’s called an AR(1) covariance matrix, which produces an intra-class correlation. If you’ve read my stuff before, you’ve seen me use it extensively. (And you’re taking a shot in the StatSpeak drinking game.)
- But then there are some teachers who are really good and some that are… just plain awful. (I suppose I could name a few names from my own educational past…) If we see that all thirty kids in a class are all achieving at a high level and the kids in the other classes in the same school aren’t, then the common factor there is the teacher (teacher level variance).
- In the same way that students are contained within classrooms, classrooms are contained within schools. And schools within districts. And districts… You can see why this technique is called “hierarchical.”
- Now, the math is a little more complex, but it comes down to this: suppose that you have kid A is assigned to teacher B in school C. What will his score likely be on “the big test” at the end of the year. His result will be a combination of the effect of his own smarts + his teacher’s ability + any school variables that factor in.
The parallels to baseball are pretty obvious. The players are the students. The manager is the teacher. And every game, there is a test. For a batter, the test is “please get a hit.” For a pitcher, “please get this guy out.” Now, are there specific managers who seem to have “students” who get better, even after we control for the fact that some of them come to the team really good and some really… developing?
I decided to look at pitchers. I took twenty years worth of data (1989-2008) and isolated all pitchers who had faced 100 batters or more that season, and had played for one team (and one team only) who had one manager (and one manager only). This is to prevent cross-contamination. I also coded for his home stadium, age, and year-league context. All of those will have effects and we need to include them in the model. There were exactly 100 managers who had managed at least one full season with a team.
(Gory details: I created a mixed linear model with an AR(1) covariance matrix. I use SPSS, but all you SAS junkies will recognize this as PROC MIXED. I used a strictly fixed-effect model with manager, age, year-league, and stadium entered into the model. I ran three different regressions, once for K rate, BB rate, and HR rate. I saved the manager effect coefficients and normalized them so that they had a mean of zero. This allows me to say things like “.75% above average.”)
Now, as you might expect, the effects on these rates were fairly modest, usually on the order of a manager improving his players by something like 1-2% on walks. A starter pitching a full season might face 800 or 900 batters, and he’d net 8 or 9 more walks (plus or minus the inevitable error of estimation) with these types of effects. It’s something that might not be visible to the naked eye. But with a big enough data set, these things tend to shake themselves out.
I also didn’t choose my stats randomly. K, BB, and HR are the ingredients in putting together FIP. Knowing that a manager generally improves his pitchers’ rates on each by X%, and the knowledge that the average team faces 6250 batters per year over about 1445 innings. At least that’s what happened in 2008. So, if I know that manager improves his pitchers (in general) by about 1% above average in K rate, then I know the team will likely have 62 more strikeouts, give or take, than if the average manager were in charge. It’s just a matter of plugging in some numbers and I can distill those manager effects into what they represent for a team’s FIP.
Before I give you “the list”, yes, I know that I should probably split this up by pitching coach instead of manager. If I can find a good list of pitching coaches over the years, I will.
The five best managers (1989-2008)
(manager, delta FIP)
- Buck Martinez, -0.90
- Ned Yost, -0.79
- Bobby Cox, -0.62
- Larry Rothschild, -0.60
- Davy Lopes, -0.55
*- these numbers should be read as “Buck Martinez, given the same staff as an ‘average’ manager, would have a predicted FIP 0.90 lower than the average manager. It might be that if Buck were given a bunch of retreads and AAA pitchers, his team would have a 6.50 FIP, but the average manager would be predicted to have a 7.40 FIP.”
The five worst managers
(manager, delta FIP)
- John Russell, +0.73
- Alan Trammell, +0.64
- Dave Trembley, +0.61
- John McNamara, +0.56
- Cecil Cooper, +0.53
The smell test seems satisfied with the names on the list (Bobby Cox is a good pitching manager… makes sense). I’m a little leary of the magnitude of those deltas though. A change is FIP of 0.63 or thereabouts is worth 100 runs over the course of a season. Did Alan Trammell cost his Tiger teams that much? It seems a little extreme. It might be that MLM is not as good at pulling apart the data as we had hoped. Trammell managed three seasons in the majors, all with the Tigers (2003-2005) and he had some awful pitching on those teams. Many of those pitchers never really got a chance to go to another team (and another manager) or hang around long enough for the Tigers to hire Jim Leyland. Since the majority of the players managed by Trammell are guys who were a) awful and b) only ever managed by Trammell, the model may be over-blaming him. The model has to put the blame somewhere.
But for some of the more established managers who have managed a bunch of pitchers who have been both in their care and with others, I trust those estimates of the manager effects. The only real way to see what is the effect of a manager is to see what a pitcher does with another manager. This just provides a systematic way to make those comparisons.
For the curious: the complete data file.