WPA Analysis: Nationals lose on missed opportunities

Last night’s Braves-Nationals game was marked by missed opportunities. Staked to an early 1-0 lead, the Nationals couldn’t get that big hit to open up the game.
In the 8th inning, the Braves struck. Martin Prado picked up his first Major League hit with a triple off of Gary Majewski to open the inning. Three batters later, Wilson Betemit, filling in for the injured Edgar Renteria, lofted a 1-2 pitch over the right-centerfield wall to give the Braves a 3-1 lead. The Nationals would put runners on base in the 8th and 9th, but the Braves held on for the win.
As part of a new weekly feature, I wanted to take a look at the outcome of this game through a Win Probability chart. (For a primer on WPA, check out The One About Win Probability at The Hardball Times.) Here’s the chart for last night’s game with a few key moments highlighted:

Heading into the bottom of the first with the game tied, the Nationals had a .545 Win Probability. After back-to-back doubles by Nick Johnson and Jose Guillen, the number had risen to .618. It would climb steadily over the course of the next six innings, but each time the Nationals threatened to score and failed, their win probability would drop back a little.
In the fourth inning, it looked as though the Nats were about to break the game open. They had the bases loaded and one man out. After Royce Clayton’s intentional walk to load the bases, the Nats win probability stood at a robust .790. But Brian Schneider hit into a double play, and that number fell back to point .658.
Again in the fifth, the Nats loaded the bases, but this time with two outs. After Jose Guillen was intentionally walked, the team’s win probability reached .745. But a Ryan Zimmerman strike out dropped that back to .682.
In the bottom of the 8th, disaster struck, and the game’s WPA chart certainly shows this sea change. Martin Prado, playing in place of Marcus Giles, reached third on a lead-off triple. The Nats’ win probability went from .771 to .541. An out raised Washington’s WP back to .633, but a walk lowered it to .588.
Then came the biggest blow of the game. Wilson Betemit’s three-run home run was the crushing blow. The Nats went from sitting pretty to dead in the water. The Braves were up 3-1 with one out in the top of the 8th. The Nats’ win probability had gone from a game-high of .790 at the start of the bottom of the 7th to .123 with one out in the top of the 8th. While Washington put runners on in the 8th and 9th, the game’s win probability wasn’t close.
From Sunday’s game’s WPA, we can see just how much untapped opportunities meant to the Nationals. Had they scored another run or two in their bases loaded situations, they could have solidified the game. Instead, the Braves, through timely pitching, were able to stay in the game until they produced their own big blow with six outs remaining for their opponents.


The Math Behind DIPS

In 2001, Voros McCracken published perhaps the most important, most influential piece of baseball research ever conducted. After separating pitchers’ lines into defense independent statistics (strikeouts, walks, home runs, hit by pitch) and defensive dependent statistics (balls in play), he found that while pitchers have much control over their defensive independent statistics (though to varying degrees), they seem to have little or no control over their defensive independent statistics. Specifically, what McCracken found was that pitchers’ defensive independent lines remained stable from year-to-year, while their defense dependent line in one year seemingly told us almost nothing about what they would do in the next year.
Since, various baseball researchers have refined his methods and both proven and refuted his DIPS (Defensive Independent Pitching Statistics) theory. At this point, we know quite a bit about pitchers’ control over whether or not a ball in play becomes a hit. We know that pitchers have quite a bit control over whether or not a ball in play becomes a fly ball or a ground ball. We also know that fly balls become outs more often, but that they also become extra base hits more often, and those two basically cancel out. We know that pitchers have some control over Batting Average on Balls in Play (BABIP), but that one season of information on BABIP doesn’t really tell us much. And, we know a whole lot more, but that’s not what this post is about.
What this is about is the math behind DIPS. What interests me (and I hope it interests you as well!) is why DIPS works the way it works. I want to know why a pitcher’s BABIP in one year does not seem to have any impact on his BABIP in the next year, especially given that we know that over long periods of time, pitchers do have a discernable impact on BABIP. Let’s look at this from a few different angles:
This is the method McCracken used to first arrive at, and then prove his DIPS theory. Correlation tells us how well two variables track each other. If they have no relationship, the correlation will be 0. If the two variables track each other perfectly, the correlation will be 1. If they track each other perfectly, but in opposite directions (i.e., a +1 change in one means a –1 change in the other), the correlation will be –1. Correlations are bound at –1 to 1, and generally, a correlation of .7 or better is considered great, .5 or better is good, and anything lower is questionable.
If we take all players with at least 500 BIP in both 2004 and 2005 (58 in all), we find that the correlation between BABIP in 2004 and BABIP in 2005 is only .110. Is that significant? In short, no. The P-value is .409, meaning that there is a 41% chance that there is no relationship between the two variables (generally, statisticians use P = .05 as the threshold of significance). Even if we accept the correlation as significant, what it tells us is that one-year worth of BABIP information tells us almost nothing.
That’s because of a concept known as regression to the mean. For example, in his first 12 games of the season, Chris Shelton hit 8 home runs. But no one expects him to hit two home runs every three games the rest of the year. Why? Well, intuitively we know that no one can hit home runs at that kind of pace, and more so, since most people probably expected him to hit 20-25 home runs all year, we certainly would not expect him to hit this many home runs. Mathematically, this is known as regression to the mean. If we know that Shelton is expected to hit 20-25 home runs every 150 games or so, we know that those 8 home runs in 12 games are pretty damn fluky. In reality, we still expect him to hit a home run every 6-7.5 games. So, in 2006, we’d still expect Shelton to hit somewhere between 30-35 home runs, even accounting for the fact that he’s probably a bit better of a home run hitter than we thought he would given his hot start.
Mathematically, regression to the mean is determined by correlation. The formula is simple: Regression to the mean = (1 – r), where “r” stands for correlation. Take, for example, our sample of players. The average BABIP among all players in 2004 and 2005 was .283. Carlos Zambrano had a BABIP of .266 in 2004. Thus, his predicted BABIP in 2005 would be, (1 – .110)*.283 + .110*.266 = .281. Though Zambrano allowed 10 hits on BIP less than the average pitcher in 2004, we would still expect his BABIP in 2005 to only be two-hundredths of a point lower than average.
Let’s look at it another way. The standard deviation of BABIP in 2004 was .016 points. Standard Deviation (SD) is a measure of spread: 68% of all players will be within one SD of the mean, 95% will be within two SD of the mean, and virtually all will be within three. What that means is that in 2004, we’d expect almost every pitcher in our sample to be within .048 points of average, or .283 +/- .048, which is from .235 to .331. In fact, BABIP in our 2004 sample ranged from .240 to .321, so that’s good. However, because we’re regressing 89% of the way to the mean, our predicted BABIP in 2005 would only have a Standard Deviation of less than .002 points! We would expect almost everyone to be within about five-hundredths of a point of average. The spread from best to worst would be less than seven hits!
In reality, though, the spread from best to worst is more like 60 hits. Obviously, some of that is due to luck, but even over large samples, the difference between best and worst is much more than seven hits. In fact, based on research by Erik Allen and Arvis Hsu, the “true” spread from best worst is about 35 hits. What this tells us is that a sample of just one season is not nearly enough to tell us much about a pitcher’s ability to prevent hits on balls in play. That’s why DIPS works: With one year of information, BABIP is practically (or maybe totally) worthless.
One great thing about baseball is that a lot of events on the field are binomial. A binomial is any event where there are only two possible outcomes, a success and a failure. On balls in play, there are only two possible outcomes: It’s either a hit or an out (okay, there are errors as well, but for our purposes, those count as outs). What’s great about a binomial is that we have a formula for determining random variance in a binomial, that is, how great a spread (otherwise known as a Standard Deviation; see, earlier concepts prove important) we would expect just based on luck of the draw. For example, if we flip a coin 100 times, we would expect 50 heads and 50 tails, but 32% of the time, would have 45 heads or 45 tails, 5% of the time we would have 40 heads or 40 tails, and almost always, we would expect to have no less than 35 heads or 35 tails.
The formula for random variance in a binomial is simple:
SQRT(Prob(Success)*Prob(Failure)*Number of Trials)
In our coin flip example, that would be, SQRT(.5*.5*100) = 5. Okay now, let’s look at BABIP. There were 179 pitchers who had 500 BIP in 2004 or 2005. Their average BABIP was .285 (the league average over those two years was .283—remarkably close!), with a standard deviation of .019, and an average of 625 BIP. How much of that would be due to random variance? Well, let’s do the math: SQRT(.285*.715*625) = 11.285 hits. 11.285/625 = .018.
That is, given that random variance accounts for pretty much the whole spread in one year’s worth of BABIP! Of course there will be virtually no year-to-year correlation when there’s so much noise. This is why DIPS works given one-year samples: The noise in BABIP is so powerful that it overpowers any true ability.
Let’s look, for example, at a group of pitchers with a large sample size: At least 5,000 career BIP. There are 312 such pitchers who started there career no earlier than 1946 (from Bob Lemon to Bartolo Colon), and they averaged 7,530 career BIP. Their average BABIP is .277. Doing the math, the expected Standard Deviation among these pitches will be about .005 points of BABIP. The actual? .012.
In fact, if we square those both to find the variances (this is mathematically necessary; we can’t just subtract standard deviation from standard deviation), and then subtract the random variance from the actual, and then take the square root of that to find the “true” standard deviation, we get .011, remarkably close to Allen and Hsu’s conclusion of .009. The rest can probably be written off as the impact of fielding, which we did not control for but Arvin and Hsu did.
So again, over large samples, BABIP is quite meaningful. Over small samples, it don’t mean a thing.
Another way of looking at this issue is by looking at distributions. What we’re interested in here is if groups of players that do well in BABIP one year do well the next, and vice-versa. There’s a mathematical way of doing this called a chi-squares test. Here’s how it works: I divided up our sample of 58 players with at least 500 BIP in both 2004 and 2005 into five groups based on their BABIP in 2004. Each group had 11 pitchers but the middle one, which had 13. I then looked at the number of hits on balls in play each group allowed in 2005, versus how many would be expected if everyone had the same exact skill at preventing hits on balls in play—that is, if there were no difference in BABIP “skill” between major league pitchers, which was basically Voros’ original postulate.
Let’s look at the table:

                Group 1	    Group 2	Group 3	    Group 4	Group 5
Observed	2090	    2005	2489	    2003	2154
Expected	2083.6122   2010.2043	2550.0176   1993.4573	2103.7087
(O-E)^2/E	0.0196	    0.0135	1.46	    0.0457	1.2023

What this tells us, basically, is that the players that were best in 2004 at preventing hits on balls in play were actually a bit worse than average at doing so in 2005. Meanwhile, those that were about average in 2004 (Group 3) were the best in our sample in 2005.
The chi-square value is found by subtracting the expected value from the observed, squaring the difference, and then dividing that by the expected value, which is what I’ve done in the third row, and then adding all those numbers together. Our chi-square value is 2.74, which with 4 degrees of freedom (which are determined by subtracting 1 from the number of categories) is highly insignificant. Specifically, our P-value turns out to be .60.
Even if we group the players, which increases our sample size, and therefore decreases the noise-to-signal ratio, we still find no evidence that one year of BABIP information is at all useful. This is now the third way we have proven Voros’ hypothesis. This is why DIPS theory works: Because, as the chi-squares test shows, with one year worth of BABIP, it’s better to simply replace a pitcher’s actual BABIP with the league average.
Final Thoughts
This post is not about whether or not DIPS is right or wrong. I’m certainly not saying that pitchers have no control over the result of Balls in Play, because, well, they do. If you want my thoughts on where they do and do not have control, I recommend you buy The Hardball Times Annual 2006, and read my article with JC Bradbury.
What this post is about is why DIPS theory seems to work. It’s about why Voros McCracken made the findings that he did. The reasons for why something is the way it is are as important, and sometimes more important, than simply knowing what is. They give us explanations for observed phenomena, which allows us to refine and draw further conclusions. For example, when Voros said that pitchers have little or no control over whether or not a BIP becomes a hit, he was wrong. What he should have said is that there is so much luck involved in whether or not a BIP becomes a hit that a one-year sample is practically meaningless.
That’s why no test will find any meaning in a one-year sample. In reality, what he was looking at was not Defensive Independent Pitching (DIPS), but rather Luck Independent Pitching (LIPS). It’s not that pitchers had no impact on whether Balls in Play became hits or not; it’s that luck had such a large effect, that any pitcher control would simply be drowned out.

Yankees offense a lesson in feast or famine

Two weeks into the season, the Yankees sit at 6-6, a .500 record that included 9 games against potential October rivals. It’s hard to complain about the Yanks since they’ve scored 80 runs this season and are on pace to blow past the 1000-run mark, but the team has played an uneven game.
Last night, I appeared on 360 The Pitch’s Outsider Radio (listen here) to talk about the Yankees. During my discussion with host Brandon Rosage, I noted that the Yankees offense had been an exhibit in feast or famine so far this season. Little did I realize just how true that observation was.
I knew that, after Sunday’s 9-3 thrashing of the Twins, the Yankees had become the first team ever to score 9 runs or more in their first six games. What I didn’t know at the time was just how bad the Yankees offense had been in the six games that they lost.
In their six victories, the Yankees have been blowing away the opposition. They’ve scored 64 runs in 6 games, and they have been utterly bludgeoning their opponents’ starting pitchers. In those six games, they’ve amassed 224 at bats, 37 walks and five hit batters. Just for emphasis, that is more than 10 runs and six walks per game. As a team, the Yankees are hitting .375 with a ridiculous .447 on-base percentage in games they win.
Those numbers are obscene. The Yankees are playing behind the scope of any team ever in the history of the game when they win. If they could find a level of consistency that’s been missing from the first 12 games of the season, only a handful of pitchers in the American League would be able to stop them. And as the Yanks have shown in beating Barry Zito and Bartolo Colon earlier this year and nearly handing Johan Santana a loss on Saturday, the pitchers that one would expect to the beat them have not so far.
But as part of their Jekyll and Hyde dance with .500 this season, the Yankees when they lost don’t come close to their victorious counterparts. Before Saturday night’s heartbreaker, the Yanks hadn’t scored more than four runs in games they lost. They are 0-3 in one-run games (a point to which we will return later) have scored just one run twice this season.
On the whole, in the six games they’ve lost, the Yanks are hitting just .217 while getting on base at just a .276 clip. They’ve walked just 17 times and managed just 16 runs per game in their losses.
These differences are extreme. The Yankees are getting on base at a rate nearly .175 points higher in games they won than in games they lose. They are well below average offensively in games they lose while they are off the charts in games they win.
In other words, when the Yankees lose, they lose badly. When they win, they annihilate. So what exactly is going on here?
Having watched nearly every game this season, I have a theory. It seems to me that in late in close games when the Yanks are behind, the batters head to plate looking to tie the game with one swing. I’ve noticed a lot of fly balls late in the game. This may be simply a factor of the Yankees being a team of fly-ball hitting power hitters. This may be a sign that the Yanks’ hitters are pressing. While the rest of this post relies on numbers, that’s just my intuition.
How then can the Yankees solve this problem? I would suggest a simple lineup fix to solve some of the Yankees’ problems. Joe Torre should flip Gary Sheffield and Jason Giambi. While many pixels have been illuminated on the Internet this year showing that lineup construction doesn’t matter in the grand scheme of the game, in this case, the Yankees would do well to have Giambi hitting third and Sheffield fifth for a few reasons.
First, Giambi is getting on base at a much more prolific pace than Sheffield so far this year. Sheffield, with a contract extension and who knows what else hanging over his head, isn’t off to a horrible start, but he’s not off to a Sheffieldian start. He’s slugging .538 with 3 home runs and 11 RBI this year. But his OBP is at .309, just 21 points higher than his .288 average. With a career OBP .100 points higher than his career batting average, Sheffield is exhibiting a noted lack of patience at the plate in the early goings this season.
Giambi, on the other hand, has been on base nearly always it seems. He is hitting .344/.543/.781. He won AL Player of the Week last week after hitting 4 home runs and getting on base 70 percent of the time. With Jeter and Damon both getting on base nearly 45 percent of the time, putting Giambi third would give Alex Rodriguez a ridiculous number of RBI chances and many with no out. The Yanks would have the added bonus though of having their number one OBP guy and arguably their biggest power threat batting in front of Alex Rodriguez and Gary Sheffield. Not to insult Hideki Matsui and Jorge Posada, but Giambi would see even more pitches to hit in the three hole than he does in the five hole.
So with their next 15 games against American League East competitors, it’s time for the Yankees to show their true mettle. If they can avoid the offensive slumps that come in between spurts of ridiculous offense, these next 15 games could begin to show some separation among AL East teams. And if Joe Torre is willing to get a little creative with his lineup card, the Yanks just might blow by team offensive records this year.