# Managers make their players better (or is it the other way around?)

June 9, 2009 5 Comments

Suppose that a team were constructed that had a lineup featuring the best player at every position (Pujols, Utley, A-Rod*, etc.) The pitching staff consisted of Roy Halladay, Tim Lincecum, and Pedro Martinez, just arrived from 1999 in a magic time machine. Mariano Rivera was available for the ninth inning. Now, suppose that you (yes, you!) were hired to sit among these men as the manager. Over a 162-game season, your team would end up something like 135-27. You’d win a World Series. And you’d still be an amateur hack.

Or maybe not. Maybe on the side, you tinker a bit with Pedro’s 1999 delivery and teach him to throw filthy side-armed sliders, under-hand rise-balls, and maybe help him out a little with that lagging fastball velocity. You also teach Pujols how to hit the ball farther (he apparently needed help with that). While you’re in the neighborhood, you teach Frank Thomas, your team’s designated pinch runner, a few tricks and he subsequently steals 100 bases and tries out for the US 4x100m Olympic relay team. And makes it. You even show Darrin Erstad (we need some HEART on this team) how to get a few more yards on his punts. (I miss FJM.) The problem is that no one really believes that you had anything to do with the World Series win. After all, you just inhereited a bunch of superstars and went along for the ride.

The manager is always the first one to be blamed when the team is having a bad spell. If you’re angry about your team’s performance and need to vent, “Fire the manager!” makes for a rather nice refrain, even if that manager is working miracles with what he has on hand… it’s just he doesn’t have much on hand. On the flip side, a manager who skippered a World Series winning side can always say that he is a “proven winner” even if in reality he had nothing to do with it or the team succeeded *in spite of him*. The problem is that in evaluating the manager’s performance, there’s a giant confound. Managers can only use the players that are available to them. You can’t make a silk purse out of a sow’s ear. Just ask Manny Acta.

A manager, in my estimation, has three jobs. He is responsible for game strategy decisions (should we hit and run here?), he is the head coach in charge of improving the players’ performance, and he’s the chief spokesman for the team both internally (keeping peace in the clubhouse) and externally (dealing with the media). Most people looking to rate managers have looked at the question from the first perspective. I’ve even seen a few attempts to develop rating systems. The third really doesn’t lend itself to measurement. But what about the second job of a manager? Major leaguers are gifted athletes, but there’s always something that they can learn. Otherwise, why have coaches? True, the manager doesn’t actually do all the coaching/teaching, but he is the man who is at the top and who hires the guys that do the teaching. How can we tell if a manager is a good teacher/coach?

Why not look at managers the same way that we look at classroom teachers? If you have kids, they probably took a state achievement test a few months ago. (If you don’t, you might remember taking the dreaded “Iowa Test” or the CATs at some point when you were in school.) These tests are rather controversial, as is *everything* in education policy. Most of the controversy has been around whether teachers are “teaching to the test” rather than encouraging creativity and innovation. But, there’s another hidden aspect. The No Child Left Behind Act called for such annual evaluations as a way to test whether schools were “failing.” Then, there is the controversy of merit pay for teachers. Teachers who have low-scoring kids must be poor teachers, right? They should be paid less than those whose kids score better?

So, how to tell, in the parlance of former President G-Dubs, “Is our children learning?” And while we’re at it, which teachers and schools are doing good work. Statistically speaking, that answer calls for a rather high-level statistical tool, hierarchical linear modeling (HLM). (Note: technically, what I’m going to do is a related procedure, MLM or mixed linear modeling, but it’s the same basic idea.) Here’s the intuitive introduction to HLM. Imagine what factors influence a kid’s score on a test like the CATs.

- First off, the kid may be smart or… what’s the polite term?… still developing proficiency (individual level variance).
- We can also track individual variance over time. If a kid gets a 75 out of 100 one year, we don’t expect a 30 the next year, assuming that the test has good reliability. We can correct regression models for that. It’s called an AR(1) covariance matrix, which produces an intra-class correlation. If you’ve read my stuff before, you’ve seen me use it extensively. (And you’re taking a shot in the StatSpeak drinking game.)
- But then there are some teachers who are really good and some that are… just plain awful. (I suppose I could name a few names from my own educational past…) If we see that all thirty kids in a class are all achieving at a high level and the kids in the other classes in the same school aren’t, then the common factor there is the teacher (teacher level variance).
- In the same way that students are contained within classrooms, classrooms are contained within schools. And schools within districts. And districts… You can see why this technique is called “hierarchical.”
- Now, the math is a little more complex, but it comes down to this: suppose that you have kid A is assigned to teacher B in school C. What will his score likely be on “the big test” at the end of the year. His result will be a combination of the effect of his own smarts + his teacher’s ability + any school variables that factor in.

The parallels to baseball are pretty obvious. The players are the students. The manager is the teacher. And every game, there is a test. For a batter, the test is “please get a hit.” For a pitcher, “please get this guy out.” Now, are there specific managers who seem to have “students” who get better, even after we control for the fact that some of them come to the team really good and some really… developing?

I decided to look at pitchers. I took twenty years worth of data (1989-2008) and isolated all pitchers who had faced 100 batters or more that season, and had played for one team (and one team only) who had one manager (and one manager only). This is to prevent cross-contamination. I also coded for his home stadium, age, and year-league context. All of those will have effects and we need to include them in the model. There were exactly 100 managers who had managed at least one full season with a team.

(Gory details: I created a mixed linear model with an AR(1) covariance matrix. I use SPSS, but all you SAS junkies will recognize this as PROC MIXED. I used a strictly fixed-effect model with manager, age, year-league, and stadium entered into the model. I ran three different regressions, once for K rate, BB rate, and HR rate. I saved the manager effect coefficients and normalized them so that they had a mean of zero. This allows me to say things like “.75% above average.”)

Now, as you might expect, the effects on these rates were fairly modest, usually on the order of a manager improving his players by something like 1-2% on walks. A starter pitching a full season might face 800 or 900 batters, and he’d net 8 or 9 more walks (plus or minus the inevitable error of estimation) with these types of effects. It’s something that might not be visible to the naked eye. But with a big enough data set, these things tend to shake themselves out.

I also didn’t choose my stats randomly. K, BB, and HR are the ingredients in putting together FIP. Knowing that a manager generally improves his pitchers’ rates on each by X%, and the knowledge that the average team faces 6250 batters per year over about 1445 innings. At least that’s what happened in 2008. So, if I know that manager improves his pitchers (in general) by about 1% above average in K rate, then I know the team will likely have 62 more strikeouts, give or take, than if the average manager were in charge. It’s just a matter of plugging in some numbers and I can distill those manager effects into what they represent for a team’s FIP.

Before I give you “the list”, yes, I know that I should probably split this up by pitching coach instead of manager. If I can find a good list of pitching coaches over the years, I will.

The five best managers (1989-2008)

(manager, delta FIP)

- Buck Martinez, -0.90
- Ned Yost, -0.79
- Bobby Cox, -0.62
- Larry Rothschild, -0.60
- Davy Lopes, -0.55

*- these numbers should be read as “Buck Martinez, given the same staff as an ‘average’ manager, would have a predicted FIP 0.90 lower than the average manager. It might be that if Buck were given a bunch of retreads and AAA pitchers, his team would have a 6.50 FIP, but the average manager would be predicted to have a 7.40 FIP.”

The five worst managers

(manager, delta FIP)

- John Russell, +0.73
- Alan Trammell, +0.64
- Dave Trembley, +0.61
- John McNamara, +0.56
- Cecil Cooper, +0.53

The smell test seems satisfied with the names on the list (Bobby Cox is a good pitching manager… makes sense). I’m a little leary of the magnitude of those deltas though. A change is FIP of 0.63 or thereabouts is worth 100 runs over the course of a season. Did Alan Trammell cost his Tiger teams that much? It seems a little extreme. It might be that MLM is not as good at pulling apart the data as we had hoped. Trammell managed three seasons in the majors, all with the Tigers (2003-2005) and he had some awful pitching on those teams. Many of those pitchers never really got a chance to go to another team (and another manager) or hang around long enough for the Tigers to hire Jim Leyland. Since the majority of the players managed by Trammell are guys who were a) awful and b) only ever managed by Trammell, the model may be over-blaming him. The model has to put the blame somewhere.

But for some of the more established managers who have managed a bunch of pitchers who have been both in their care and with others, I trust those estimates of the manager effects. The only real way to see what is the effect of a manager is to see what a pitcher does with another manager. This just provides a systematic way to make those comparisons.

For the curious: the complete data file.

Your ‘schooling’ example is my exact mixed effects HLM question that we had for my stat methods final project a couple years ago π

The 1-2% effect (at least that you see for walks) is in the range of what I’ve seen from other studies. The “Frontier Analysis” studies that I’ve seen seem to do a decent job at managerial efficiency. However, it seems really difficult to apply this ‘efficiency’ directly to winning. One of the latest studies using that approach was a paper by Smart, Wolfe and Winfree in Journal of Sport Management (2008). They extend a paper by Smart and Wolfe that used fairly basic analyses that had a lot of confounds that you speak of. They end the paper attempting Frontier Analysis (applied to RBV managing…it’s more manager-ish than most) and find very little variance explained by managerial efficiency (something in the range of 0-2% in all their analyses).

I’m not convinced that this means managers have no effect on the game, or that MLB managing isn’t a somewhat rare skill. But I think it could mean that, at the MLB level, there are enough equally talented managers to offset one another’s strategy.

I was curious if HLM would be able to do a similar type of analysis. It’s interesting to see Cox up there for the pitching. He had the one guy that a lot people consider to be the best pitching coach ever in Mazzone (then again, so did the Orioles :-().

Maybe we can get MLB to do an experiment and rotate their managers with one another for a 3-4 year period at 50 game clips and make everyone spend the same payroll so schedules/teams are all the same. Think they’d go for that?

On another serious note, thanks for continuing to do things a little different from other sites.

I’m feeling buzzed.

We also have to remember that these aren’t “true talent” deltas. Trammel

mighthave made Tigers pitches 100 runs worse (still doubtful) in that period, but over time that .64 difference in FIP would regress and he’d look a little less clueless.THT recently did a similar study on this, but for hitters, I believe in their latest annual, and found that hitters hit better for some new managers than others, and I recall that Bochy was credited with 1 WAR per season based on how much his hitters improved with him than before.

I’m surprised that the Giants did not show up in this study. Fangraph studies found that Giants pitchers are somehow able to keep their HR-rate lower than the mean everyone is suppose to regress to, accrediting that improvement to their pitching coach, Righetti. I would have thought that they would have shown up somewhere here.

FYI, I was unable to access the file linked at the bottom, else I would have quoted where SF was, maybe the new Google Drive took that away.

Lastly, I was looking into Bochy’s record in one-run games and found that he appeared (if I did my elementary stats right, and that is a big question mark) that the null hypothesis that he is .500 in one-run games appear to be false given this career record so far, meaning that he was above .500. And i found that he was just over, meaning basically all the other managers must not pass the null hypothesis, as I found that there were no NL managers even close to his record during his managerial tenure. Obviously, i’m using elementary stats here, so I thought I would bring this up and see if you might tackle it in a much more sophisticated and accurate fashion.

Thanks for your thoughts! As far as why the Giants are not here, it might help to point out this article was written in 2009 π