Third base coaches, get your windmill arm ready

A request came through on the SABR Statistical Analysis distribution list requesting some information about how often runners attempt to take “extra” bases on base hits. That is, how often do runners attempt to go from first to third on a single, first to home on a double, or second to home on a single (and how often do they succeed). I took a look at the last seven years (2000-2006) and found a few interesting patterns in the data.
I broke down each of the potential events by their occurences with either two outs or less than two outs. Not surprisingly, runners tried for “extra” bases more often when two men were out, as they were able to run on contact. And, by and large, they were successful. In all cases, success rates were above 90%, no matter which “extra” base was attempted and no matter how many were out. There was a curious hiccup though that caught my eye. With two outs, and a runner attempting to score, either from first on a double or from second on a single, rates of attempting to take the extra base go up, but success rates go down by a percentage point or two.  Third-base coaches take a few more chances with two outs, but is that the right thing to do?
The question of whether or not teams have optimized their “regular” stolen base attempts is been one that’s been studied.  The general agreement is that a manager should expect about a 70% success rate for a steal attempt to make sense and wouldn’t you know it, teams generally have about a 70% success rate in stolen bases.  The standard practice is to use the run expectancy matrix.  Let’s assume that there are no outs and a runner on first, and the manager is considering whether or not to send him.  Right now, his team has a run expectancy (using 2006 figures) of .927 runs.  If he breaks for second, he might be safe, in which case, the team will have a runner on second and no one out, for a run expectancy of 1.154 or he might be out (no runners, 1 out) for a run expectancy of .298.
Let’s find the percentage of times that this runner must be safe (we’ll call it p, with 1-p representing the percentage of times he will be caught stealing) for this strategy to break even, that is to where the runs expected in the current situaiton (runner on 1st, no outs), is equal to the run expectancies of the possible outcomes (either SB or CS).  The algebraic formula is:
Current run expectancy = probability of being safe * RE of situation after being safe + (1-p) * RE of situation after being caught stealing
Plugging in the numbers:
.927 = (p) *  1.154 + (1 – p) * .298
.927 = 1.154(p) – .298(p) + .298
.629 = .856(p)
(p) = 73.5%
With one out, the break-even point is 73.5%.  So a manager should have 73.5% confidence that his runner will steal safely in this situation for the steal sign to make sense..
But, what’s the break-even point for trying to “steal” home from first on a double?  We can calculate this one with the same logic.  We assume that any runners who had been on second or third have already scored, and the third-base coach is now faced with the decision of a certainty of second and third vs. attempting to push the runner with either a run scoring (and a runner still at second) or the runner being thrown out (and a runner still at second, unless the runner was nailed at home for the third out.)  Let’s take a look at the data for no one out.  With no one out and runners at 2nd and 3rd, a team can expect to score 1.965 runs.  If the runner is safe, there’s a run in on the play, plus a runner at second with no one out, for a total run expectancy of 2.154 runs (1.154 for the runner on second and one for the runner that scores on the play).  If the runner is out, there’s a runner on second with one out, for a run expectancy of .736 runs.  If you plug all of those numbers in, the break even point is 86.7%.  With one out, the break even point is 79.4%.  With two outs, the break even point actually drops to 43.1%! 
That covers first to home on a double, but what about other “stolen” base situations.  For second to home on single (assuming no other runners), with zero, one, and two outs, the break even points are 91.7%, 70.3%, and 39.8%.  The message here is that given the chance of “stealing” a run, it’s better to take it, especially with two outs.  In order for a runner to score from third with two outs, the next batter will have to do something other than make an out.  Even the best players manage that a little bit north of 40% of the time.
For first to third on a single, the break-even values are 91.2% with no one out, 76.9% with one, and surprisingly, back to 91.6% with two outs.  Score one for conventional wisdom.  You really don’t want to make the first or third out of an inning at third base.
But, for the last seven years, success rates have been in the low-to-mid 90s.  It looks like third base coaches aren’t really optimizing their use of the old windmill arm.  Why not?  Well, if a third base coach sends the runner and he makes it, the runner gets the credit (or the fielder made a bad throw or the catcher didn’t block the plate or it was just “expected” and nobody notices.)  If he gets nailed at the plate, the third base coach gets blamed, probably even more so if it’s the last out in an inning (or worse, the game).  It looks like third base coaches are concerned more about their own backsides than the well-being of their teams!


Does swinging at the pitch really protect a base-stealer?

Another one of those things I tend to over-hear on the radio: When a runner is trying to steal, the batter should swing, as this will disrupt the catcher and give the runner a better chance of stealing. Here’s to ye olde baseball conventional wisdom, the starting point of many a Sabermetric writing. Fair enough. Let’s see if this one stands up to the evidence. First, I isolated all instances of a stolen base or caught stealing, either of 2nd or 3rd in 2006, absent pickoffs. (As always, thanks to Retrosheet for the fact that I have a gig… and that I haven’t yet finished my dissertation.)

I looked at what the batter did on the pitch that came to the plate while the runner attempted to steal and classified it either as a swing (swinging strike or missed bunt), or a non-swing (either a ball or a called strike). I also classified pitches as either strikes or balls. Needless to say I also looked at whether the runner was safe at second (or third). The rest was just a matter of a few chi-square analyses.

Does swinging “protect” a base-stealer? Far from it. If the batter swung at the pitch while a runner attempted to steal 2nd, the runner was safe 65.7% of the time, while if he didn’t the success rate was 77.9%. Looking at a runner stealing third, the same finding emerged. When the batter was not swinging, the runner was safe 82.1% to 56.8% for the swingers. Both chi-squares were significant. Apparently, conventional wisdom has been wrong for a while, although players only swing about 19.8% of the time in this situation, so perhaps the “wisdom” is more present in the press box than on the field itself.

This is a strange finding though that needs some explanation. The fact that a relationship has been found does not mean a cause has been found. For example, it’s possible that batters are more likely to swing and miss (in this case, all swings missed. A foul ball would have returned the runner to the original base and a ball in play would not have resulted in a stolen base) at well-placed pitches and that those well-placed would be more easily used to gun down the would-be base stealer. Let’s take away the swing for a moment, and look at whether on non-swinging responses, whether the pitch was called a ball or strike made a difference on the runner’s fate when he tried to steal second. On balls, the runners’ success rate was 78.5%, while on called strikes, it was 76.3%. Chi-square was not significant, indicating that in this data set, it makes no difference statistically whether the pitch was called a ball or a strike. Now, this doesn’t disprove the original conjecture. Perhaps balls that are swung at and missed are a different breed altogether (more likely to miss a fastball?) or perhaps those that are swung and hit are the oddballs (pardon the pun). There could be a bias in which pitch types end up in the catcher’s glove and they might be biased in the direction of easier throws to second. There might also be pitches which are easier for the runner to read (and get a good jump on) and which are less likely to be swung at (or more likely to be hit when they are swung at).

Another factor floated in such discussions is the handedness of the batter. Most catchers are right-handed, and naturally spring out toward the left-handed batters’ box when making a throw to second. Perhaps if that box is occupied by a left-handed batter, the catcher will show some ill effects of this “obstacle.” The answer is no. Success with a lefty in the batter’s box was 76.1% and with a righty, it was 75.0%. No significance in that association. For a throw to third, there is an advantage to the runner for having a right-handed hitter in the batter’s box (80.6% to 67.7%). In that case, a right-handed batter is more directly between the catcher and third base.

To be honest, I’m confused. I don’t know what to make of this finding. Why would a batter’s swinging actually make it easier for the catcher to throw a runner out? Perhaps the batter is more of an obstacle standing there after not having swung than if he had swung. But, for what it’s worth, a reminder that one should never take accepted wisdom in baseball (or anything) without asking to see some proof.

A look at TotalZone from the batters box

Once I set up my play by play database to measure defense, its a simple switch to group the results by batters, instead of fielders. Specifically, I’m focusing on ground balls to see what hitters have the most successhitting the ball on the ground.
It should come as no surprise to you that Ichiro is the king of ground balls.  Infielders (pitchers and catchers included) record outs only 64.7% when he hits on on the ground.  As with my defensive measures, I am looking at only the years 2003-2006 for now.
I used a cutoff point of 500 ground balls, among the few who didn’t make the cut, Esteban German and Chris Duffy were slightly better.  There are two ways to make your groundballs count, being really fast or hitting the ball really hard so it gets past the infield.  Obviously we know what category Ichiro falls into.  His 176 infield hits ranked second behind only Juan Pierre, but Pierre’s groundballs are outs 70.7% of the time, as he does not get nearly as many hits through the infield.
Others ranking very high include Willy Tavares (64.8), Rocco Baldelli (64.9%) , Alex Rodriguez (65%) and Kenny Lofton (66%).  One of these is not like the others, while Rodriguez runs well, he’s not getting the infield hit benefit the others are.  Gary Sheffield (67.6%) is another who gets a lot of groundball hits without a lot of speed.  He’s probably the batter third baseman least want to see stepping up to the plate.  Ichiro might make you rush a throw and look foolish, but playing defense against Sheffield can hurt.
The bottom of the list is dominated by pitchers.  They hit a weak ground ball, feel really happy they didn’t strike out, and jog to first.  A ground ball by Doug Davis, Claudio Vargas, Tim Hudson, or Brad Penny is an out about 93% of the time.  Surprising to see Hudson because I thought he was a good hitter and athlete in college.  I guess spending the first 5 years of his career hiding behind a DH ruined his offense.
Among the position players, Adam LaRoche is an out 79.4%.  Lots of Molinas and other catchers on the bottom of the list.  Jim Edmonds and Ken Griffey make a lot of groundball outs, almost all pulled to the right side.
A surprise is that Bengie (75.7%) outperformed the other Molinas.  He did it by hitting the ball the hardest, as he had the fewest infield hits despite the most groundballs of the 3.

The Adam Dunn debate: Defining plate discipline

A small brag, if I the reader will humor me.  A study of mine on strikeouts and walks has been published in By The Numbers, which is the official newsletter of SABR‘s Statistics and Analysis Committee.  The study (which you can find in this PDF file published under my… umm… real name… shhhhh) is entitled “Is Walk the Opposite of Strikeout” and I argue that the answer is no.  Walk and strikeout are actually more alike than similar and that the opposite of these two is “ball in play.”
My starting point is the Adam Dunn debate as to whether he is a “disciplined” hitter.  Dunn is the paragon of the “three true outcomes” hitter.  He’s hit at least 40 HR in the last three years, while averaging something like 170 strikeouts and 110 walks.  In 2006, he had one of those three outcomes in more than half of his plate appearances.  Because of the large number of walks that he draws, Dunn has been referred to as a very disciplined hitter by some (probably all those who just finished Moneyball), while others have called him undisciplined, for the obvious reason that he’s been flirting with 200 strikeouts over the past few years.  Who’s right?  Well, a lot comes down to how you define plate discipline.
The most common measure of plate discipline that I’d found is some sort of ratio between strikeouts and walks, usually K/BB.  The problem, which I point out in my article is that there is more than one way to avoid striking out.  A walk certainly is one, but is it necessary that players with low walk totals are undisciplined.  Is it not equally disciplined to take a big fat hanging curve ball and place it in the left field stands?
To that end, I developed two new metrics based on signal detection theory while taking advantage of the pitch-by-pitch data in the Retrosheet data files.  The article in By The Numbers goes into greater detail on how the metrics are calculated, but the basic idea behind signal detection theory is this: A batter must see whether or not the pitch is hittable (i.e., in the strike zone… more on the obvious objection to this in a minute) and then decide whether or not he should swing.  He might swing at it and miss, or he might not swing and have it be a called strike.  Either way he’s made a mistake and will have a strike called against him.  However, from a signal detection theory standpoint, the type of mistakes he makes are telling.  In fact, through looking at the two types of mistakes, plus the times that he actually hits the ball or takes a called ball, signal detection theory can generate two measures.  One looks at how likely a batter is to swing, the other at how good a batter is at reading the strike zone and making good decisions.  He may be good at reading the strike zone, just too anxious (or too passive) with his swings.  Or, he may simply be guessing in the strike zone.
It turns out that the two measures actually correlate differently to walk and strike out rate.  How good a hitter is at reading the strike zone was very correlated with his strikeout rate, but not walk rate.  How likely he was to swing was more correlated with his walk rate, but not strikeout rate.  Players who swung less walked more.  Looks like walks and strikeouts are manifestations of two different skills.
There are a few problems.  One is that players do swing at pitches in the strike zone and miss them.  The other is that they will sometimes golf a hit off of their shoetops.  Retrosheet data (which is free!) doesn’t give pitch locations.  Eventually, I’ll learn how to mine the data from MLB’s Enhanced GameDay to my advantage on this one.  But, for now, I think it’s an enhancement over the simple K/BB metric we have now.
In any case, take a look at the article, and discuss.  I’d like to refine this a bit and it’s always better to have a little bit of collaboration.
And for what it’s worth, Adam Dunn finished #401 last year among all hitters with at least 100 PA.  Out of 431.  Perhaps you can tell which side of the debate I’m on.

A Sabermetric/English dictionary

You’re out with some friends watching a game, either at a bar or at the ballpark.  For a while, you’re talking about old times, wondering when your favorite team is gonna trade for a good third baseman, and making fun of your buddy Larry for having to get home early because his wife asked him to take Junior to school the next morning.  Then, it happens.  Larry turns to Ryan and says, “Hey, Ryan, what was Billy Pilgrim’s VORP last year?”
And so it begins.  For the next hour, you know that they’ll be like this, talking about obscure statistics.  It’s odd really.  Most of what they speak is English, but they throw in terms that to you have no meaning and those seem to be the important parts of the conversation.  You guess that WARP3 has something to do with Star Trek or the Rocky Horror Picture Show, but you don’t know why they’d be talking about baseball and Star Trek at the same time.
I never quite realized what we Sabermetricians must sound like to the uninitiated until last August.  I was in Moscow, Russia with my wife (she was born there).  It was bad enough that I had to watch Walker, Texas Ranger dubbed into Russian(!), which I don’t speak.  I could excuse myself for not knowing what was going on there, because it was all in an actual different language.  The really fun part came when I joined my wife at the conference she was attending on cell biology, which was being held in English. 
Unfortunately, everything I know about cell biology can be written on the back of a dime with a crayon, but at least these people spoke English.  I listened just happy to hear someone speaking a language I recognized, but to this day, I have no idea what they were actually talking about.  I can only presume that the non-Sabermetric speakers out there have the same reaction that I did at the conference… it’s English, but it’s not really intelligible.  (That is if they don’t have the same reaction I had when watching Chuck Norris do a spin-kick-ski.)  So, as a helpful guide, I give you a (tongue-planted-firmly-in-cheek) Sabermetric-to-English dictionary.
Acronym: A way to try to make an incredibly geeky concept sound cool.  Usage: His WXRL is 3453.22.  Proper response is smiling and nodding.
Baseball Prospectus: You know how you remember the first time you saw a girlie mag?  This is sorta the same thing for your Sabermetric friend. And on the eighth day
Clutch hitting: Grab some popcorn.  You’re going to be here a while.
Defense: Something that every Sabermetrician has a system for measuring that he is “working on”.
DIPS: 1) Something in which one places celery or tortilla chips, then eats.  2) An idea that basically absolves pitchers of all guilt for just about anything that happens once the ball is hit.
Fantasy leaguers: These are the little brothers of Sabermetricians.  They’re the only people who are geeky enough to understand what you’re doing, or at least who care enough, but they ask annoying things like how many RBI Frank Thomas will have this year.  I’m guessing most Sabermetricians actually play fantasy ball, apply their crazy theories to the league and finish third every year.  Maybe that’s just me.
GB/FB Ratio: Ball go down / Ball go up.
Hunch: One of the few swear words in the Sabermetric vocabulary.  Usage: “Hargrove just manages on hunches.”  (see also: “gut instinct”)
Jackson, Shoeless Joe: Footwear deprived player for 1919 “Black Sox” team.  Proper response to anything involving this is a sigh, and a wistful reminiscence on the collapse of character in America.  Usually used in conjunction with “win expectancy.”
James, Bill:  The name at which every knee must bow.  Imagine saying being at a Star Trek convention and saying, “Hey, William Shatner is over there” to someone dressed as a Klingon.  Same basic idea. 
LOOGY: It’s not nearly as gross as it sounds.  Left-handed One Out GuY.  He’s the guy in the bullpen who’s been around entirely too long because he has a left arm and isn’t afraid to use it.  See Schoenweis, Scott.
Morgan, Joe: 1) One of the most undervalued baseball players of the modern era, as recognized by the Sabermetrically enlightened.  2) One of the worst (i.e. Sabermetrically un-enlightened) announcers of the modern era.
OPS: Depending on whom you ask, either the greatest new-age baseball statistic ever coined or a vague hackish moderate improvement over stats like batting average.  Arguing OPS with a Sabermetrician is like arguing about abortion.  It doesn’t matter what you say, they ain’t changing their mind.
Park/league/era adjustment: Suppose that you had been born 20 years earlier than you were, in Kazakhstan.  What would life have been like?  Different, right?  Now, suppose Babe Ruth would have been born 70 years later.  As a Colorado Rockie.
Pythagorean win percentage: An formula which tells you that despite the fact that another team is piled on top of each other after winning the World Series, your team was actually better this year.  And we can prove it.  (Gee, that makes me feel better.)
RBI: The dirtiest swear word in the Sabermetric vocabulary.
Replacement level: Here’s the idea.  Suppose that you want to dump your girlfriend, because you’re thinking to yourself that you could easily do just as good, if not better.  Then again, suppose that you have a fantastic girlfriend, so you make sure you buy her flowers or sign her to a long-term deal because you know that if you broke up, while you could find someone else, but she wouldn’t be nearly as good.  Transpose that into baseball.
Retrosheet: Where good Sabermetricians go when they die.  Or get off work.
Three True Outcomes: See Kingman, Dave.
VORP:  Value Over Replacement Player.  Like a lion’s roar summons other lions to the hunt, it is a word used to gather together other Sabermetricians when in a bar.  What it actually refers to is immaterial.  If you hear it, there will soon be five other guys around speaking Sabermetrics.  Run.
Got more to add?  Leave ’em in the comments.

Best Group of Old Pitchers Ever

Last season, pitchers 40 and over won 107 games.  That is the highest figure in the history of baseball, breaking the record of 89 set the season before.  I’m using seasonal age, so if a pitcher was 40 or older before July 1st, all wins from that year count.  The 2005 figure was 20 more wins than the previous year, 1985.
The 1985 season  featured strong seasons from Don Sutton, Tom Seaver , and a pair of Knuckleballing brothers.  For 2005, you had power pitchers Randy Johnson and Roger Clemens in addition to crafty lefthanders Jamie Moyer, David Wells, and Kenny Rogers.
In 2006, Wells was injured most of the time, but the other main contributors refused to retire, and kept winning games.  They were joined by 3 pitchers just turning 40, Orlando Hernandez (well, at least according to the Lahman database) and future Hall of Famers Greg Maddux and Tom Glavine, who won 15 each.
We can all but guarantee that the 2007 group will smash the record once again.  Of the double digit winners from 2006, not a single one retired.  Kenny Rogers is out for the first half of the season, but David Wells is healthy so far, and Rogers + Wells might come close to the 20 wins they combined for last season.  Maddux, Glavine, Johnson, Moyer, and Hernandez show few signs of slowing down, and Roger Clemens couldn’t stay retired when the Yankees offered him 28 million.  As a group, these guys probably will fall short of the 107 wins they had last year, but reinforcements are on the way to set up another record breaking season.
The reinforcements are John Smoltz, Curt Schilling, Tim Wakefield and Woody Williams.  This isn’t a trend that will continue forever, the only top starter this year between ages 37-39 is Mike Mussina.  Eventually the Braves 1990’s big 3, the big unit, and Schilling will retire.  One of these days Roger will retire and actually stay retired.  When that happens the 40 year old ace starter will once again become a rarity.

Stats 202: Intraclass correlation OR Yet another DIPS paper

What part of a player’s performance is actual talent and what part is just luck?  Any player’s performance has a little from column A and a little from column B.  Sudden gusts of wind can turn fly ball outs into home runs and home runs back into fly ball outs, but in the end, we assume that it all evens out.  Players’ stats generally (but not always) reflect their abilities and those abilities remain fairly constant from year to year.  Even the casual fan can, before the season starts, make up a list who will be the top ten home run hitters in baseball in the coming season, and will probably be right on 7 or 8 of them.  Even more, that same casual fan can probably give a pretty good estimation (perhaps we might say “in the ballpark”?) of how many home runs Player X will hit in the forthcoming year, assuming that he doesn’t get hurt.
Not all stats are stable like that.  Pitching stats, in particular, tend to be a little more volatile from year-to-year.  Those who dabble in Sabermetrics might be familiar with the wild variations in bullpen ERA that happen from year to year.  Relievers are notorious for being a very fickle breed with good stats one year and bad the next.  In fact, it was this instability that led Vörös McCracken to take a look at the stability of certain pitching measures over time.  His method was to take a look at different metrics in 1998 and 1999 (the years are irrelevant, that’s just the data set he used), and to see how strong of a correlation there was between a pitcher’s performance in 1998 and in 1999.  Those who were above the league average in ’98 should also be above average in ’99, and those below should be below, assuming that the statistic is measuring some actual skill (one that wouldn’t be affected by the simple passage of a year).  If that doesn’t happen, we start to think that there is no relationship over time, and that the fluctuations have much more to do with luck.
McCracken found that some measures had high correlations from year-to-year, and some did not.  In particular, statistics like a pitcher’s strikeout rate, walk rate, and home runs-given-up rate were fairly stable.  These are events in which the defense behind the pitcher isn’t involved in the outcome of the play.  The problem was that when McCracken looked at events in which the ball is in play, using the statistic batting average on balls in play (BABIP), which was a percentage of how often a ball in play went for a hit, he found relatively little correlation.  Once the ball was in play, it was up to luck (and the seven guys behind him) to see what would happen to the ball, and his pitching line.
My goal in this tutorial is not to argue the merits of DIPS.  That’s been done elsewhere by others (and perhaps in the future by me).  Google Voros McCracken DIPS and you’ll find plenty of discussion on the topic.  My goal, instead, is to use a better method than year-to-year correlation to test whether this information is the case.  It’s not that year-to-year correlation is a horrible thing, just that intraclass correlation is better.  Here’s the idea: Year-to-year correlations take into account two years worth of data.  That’s nice, but we have so much more data available!  Good old bivariate correlation is limited by the fact that it can only take two data points at a time into consideration.  First, it finds the variance shared between the two variables and standardizes them by the product of their combined standard deviations.  But, what if correlation could look at multiple years at once? Read more of this post