So how long does it take for BABIP to become reliable?

Seems a simple question.  We know that BABIP (batting average on balls in play) for pitchers has a low correlation from year to year.  As a result, a Sabermetric standard has been that one year in a pitcher’s life tells you little about his actual ability to prevent hits on balls in play, which is true.  In statistical terms: one year is not a sufficient sample to get a good estimate of the parameter, primarily because a pitcher only faces a few hundred balls in play each year.

Suppose though that a pitcher’s season lasted billions of plate appearances.  Eventually, we’d know exactly how good a pitcher was.  If we let him face another billion hitters, he’d come up with the same number again.  That sort of sampling frame produces reliable statistics, but it’s a fantasy.  We have to deal in reality.

But after looking at year-to-year stats, with the low correlation between BABIP at year 1 and BABIP at year 2 (which has held any which way you try to break it), it’s been assumed that pitchers have no control at all over their BABIP, ever.  That’s a big jump, one that I think people make without fully stopping to realize that they’ve made.  (I’ve probably made it myself.)  There’s a difference between a parameter being entirely random and it being unobservable given our limited data and the amount of noise present. 

The assumption goes that everyone is a .300 pitcher once the ball is in play and doesn’t leave the stadium.  After all, if there’s no stability, it must all be random noise.  Right?  It’s just that no one has ever been really comfortable with that thought.  Pitchers don’t differ in their BABIP ability at all?  Pedro Martinez in his heyday was the equivalent of Mike Bacsik in his heyday?  It just doesn’t make sense.  Then there is the curious case of Troy Percival (my personal favorite piece of anecdotal evidence.)  His BABIPs have been consistently below the magic .300 line throughout his entire career, and it’s been a long one.  Could it happen by chance?  Sure, but perhaps something else is afoot.

Maybe the problem is that we need to widen the sampling frame.  Maybe one year doesn’t tell us much about a pitcher’s true talent on BABIP, but what if several years do? 

I took 30 years worth of Retrosheet data (1979-2008) and dumped it into a giant file.  I selected all balls in play (not a strikeout, not a walk, not a home run, not HBP, not one of those weird catcher interference thingies.)  As I have been wont to do lately, I started running some split-half reliability analyses.  I split each pitcher’s batters faced into even and odd numbered appearances (so, I’m drawing the first PA into the odd group, then the second into the even group… it balances out the two halves of a player’s performance so that I’m drawing some from year one, some from year two, etc.)

For each pitcher, I started by taking a sample of 500 balls in play and splitting them into two 250 BIP halves (those that had 500 to give).  I ran a correlation between those two halves for all 1461 pitchers in the sample who fit the criteria.  The correlation was .174.  So, at 250 BIP, BABIP has a split half reliability of .174.  It’s numbers like that which led to the creation of DIPS theory to begin with.

But let’s expand.  Let’s take two samples of 500 BIP.  That bumps things up to .253.  Hmmmm, getting a bit more reliable.  The question becomes when does it hit that “good enough” point.  I’ve argued previously for the use of .70 as the cutoff for reliability. It’s an arbitrary point (I guess in an ideal world, we’d want a reliability of 1.0), but .707 has an R-squared of .50, which means anything north of that accounts for more than 50% of the variance.  Can we get to .70?

Turns out that the answer is… yes.

At a sample of 3750 balls in play, (a 7500 BIP sample, chopped in half… there were 48 pitchers in the last 30 years who had that many BIP to look at… not outstanding, but enough to not discount), the split-half reliability was .696.  At 4000, it reached .742 (in 34 pitchers).  So, it only takes about 3800 BIP before we get a reliable read on a pitcher’s BABIP abilities.  That’s a lot, but it’s not an obscene amount.  In 2008, the average pitcher saw roughly 3 balls in play per inning pitched.  At that rate, a starter who throws 180 innings would see about 540 BIP in a year (rough estimates here.)  So, it would take about seven years, at that same 180 IP per year rate, to get to the required number of BIP.  Not easy, but not out of the realm of possibilities.

Now, about those guys who had two matching 4000 BIP samples, there was still some variability in the sample.  Andy Pettite had BABIPs in his twin samples of .318 and .312.  Charlie Hough had the other extreme at .248 and .266.  So, it looks like there is such a thing as the “ability” to exert some control over what happens to a ball in play.  It just takes a while (but not forever) to reveal itself.

This isn’t a very functionally useful finding for evaluating players or predicting what they will do.  A pitcher is not the same man he was at the beginning and end of seven years (either as a pitcher or a human being).  The ability to prevent hits on BIPs may deteriorate over the years and at that point, we’re using data that are 6 and 7 years old to predict what will happen tomorrow.  In a single season, which is really the sampling frame that most fans are concerned about, there will still be a lot of noise around the signal, but the signal is definitely there.  Now if we can just get a better radio to pick it up.

Off Season To-Do List

Yes, here in the U.S., it’s Thanksgiving Day, but you don’t have to live here to give thanks!

Aftering going over many hills and through the woods, eating large quantities of turkey and all the trimmings at my mother-in-law’s house and then sleeping it off, it’s time to talk about my off-season sabermetric to-do list.

Finally I finished programming everything into my batting projections, and published the results last week. However, in order to do a comprhensive eveluation of a player, in addition to batting we also need baserunning, defense and pitching, in the end all expressed in runs, so that they can be summed into a number representing the total contribution.It’s one thing to be able to show how any hitter projects, but without knowledge of speed, arm and defense, it’s hard to make a final judgement.

In yesterday’s Roundtable, we were asked for our World Baseball Classic starting lineups for the U.S. Derek Jeter and Michael Young are the two best hitters at shortstop, but both are among the worst defensively. Jimmy Rollins is good but not as good with the bat, but has the good defense to be most people’s overall choice as the best U.S. born shortstop. Another example is in the Pirates’ Roster. Brandon Moss plays rf, lf and 1b, and the past four seasons his translated wOBAs have been .339, .335, .334 and .331. Andrew McCutchen plays cf. His wOBAs the past three years have been .342, .322 and .323. Moss looks to have a slight edge in batting productivity, but compared to corner outfielders (.347) and firstbasemen (.357) he’s way below average, while McCutchen is only slightly below all centerfielders (.330). Add in that BP’s baserunning stats show Moss as dreadfully slow whiel McCutchen has a reputation for being very fast, and that Moss is regarded as a poor fielder to McCutchen’s good, and you might conclude that McCutchen should be in cf, McLouth in lf, and Moss in AAA.

The first question usually asked about pitching analysis is if it will be DIPS compliant. Yes and no. The problem with DIPS is that it has an all or nothing approach. Pitchers get no credit for the number of base hits allowed, and full credit for everything else. My pitching projections will be very similar to the batting, and each component will have it’s own regression factor. I need to do work on determining the exact values to be used, but BABIP is about 20% pitcher and 80% defense. Therefor, it will be regressed much heavier than homerun, walk and strikeout rates. One problem I will have with pitching is that the available minor league statistics don’t cover all the categories – missing things like batters faced, intentional walks and hit batsmen for many or all seasons.

My fielding and baserunning will need play by play. Just today RetroSheet released the 2008 dataset. My formulas will be very similar to what Colin, Pizza Cutter and Dan Fox have done, but I want to also use them on minor league data. GameDay has play by play available for all minor games starting on 2006, which will also solve the missing pitching categories.

Before any of that data can be used it needs a database to hold it. Right now I can do Retro and major league pfx centric processing. I am working, on and off and now back on, on a database design that will hold Baseball DataBank, KJOK, RetroSheet and pitch f/x data, and be able to have daily automatic updates from GameDay of both major and minor league games. After the database is constructed, scripts have to be modified to download and parse the all of the GameDay files, inserting the values into the database.

The PanDIBS theory: Pitching and Defense Independent Batting Statistics

DIPS changed everything.  (Thanks Voros!)  It was the first sustained theory that evaluated players not so much by what a player had done over the last year, but at which part of the player’s (in this case, the pitcher’s) performance was something within his control and what was out of his control.  The theory has been refined here and there, but the basic idea remains: there are some things that a pitcher has more control over than others.  It’s a little disconcerting to think that so much of baseball rides on luck, but it’s important to know.
What’s odd is that this line of theories seemed to stop there.  To my knowledge, no one’s really looked at whether there’s any analogous coherent theory out there for batting statistics.  Are there some batting stats that seem to be more statistically reliable (i.e., skill based) and some that are more un-reliable (i.e., luck based).  I’d contend that the answer is yes, and the pattern works in a specific way.  In previous columns, I’ve taken some time to meditate on the statistical reliability of many stats, some more esoteric than others.  (Eric Seidman once called me the master of statistical reliability.)  When I looked through some of the work from a more wholistic standpoint, the pattern became pretty clear.
Take a look back at this article on when different statistics stabilize enough to the point where they can be considered reliable.  The stats that stabilize the quickest are the ones over which the batter might be expected to have the most control (whether or not he swings, how often he makes contact), but then are followed in controllability by things over which there is some interaction between the batter and the pitch (type of batted ball), and then by some of the actual results that come from that batted ball (single, home run, out).  Roughly.
Let’s model the outcome of an at-bat in a flowchart.  A plate appearance can basically end in one of four ways.  The batter can walk, strikeout, be hit by a pitch, or do something that involves hitting the ball (or he’ll reach on fielder’s interference… once every five years).  The first three events end the plate appearance right there.  If he hits the ball, it will either be a flyball, grounder, liner, or popup.  If it’s a flyball, it might be a HR, or it might be an XBH or a single or be caught (or dropped) by a fielder.  I could do the same basic breakdown for all the other types of batted balls.
As you get further and further down the flowchart, with more steps involved in the process, the underlying rates become more unreliable statistically.  Part of it is the fact that as you split off further and further, a player only has say 150 ground balls, but may have 600 plate appearances.  Anything where you get 600 measurements on anything, it will be more reliable than 150 measurements.  But, in general, when you constrain the data set so that you’re comparing the reliability at 150 PA vs. 150 GB, the stats closer to the base of the flowchart still show up as more reliable.
DIPS proposed two categories for statistical reliability.  Category one was a pitcher’s K rate, BB rate, HBP rate, and (I believe erroneously) HR rate.  Category two was the now famous BABIP.  BABIP was considered to be the product of luck, while K and BB were the product of skill.  Here I propose PanDIBS, with three (perhaps four) tiers of batting statistics to consider.  The most reliable of all stats are the swing diagnostics (and we know that they’re important), although no one ever really wants to project what J.D. Drew’s contact percentage will be.  Let’s call swing diagnostics the zero-th level.  The first level, in terms of reliability of the stats are the DIPS stats: K rate, BB rate, HBP rate.  The second level is the player’s batted ball profile (GB%, LD%, FB%).  The third level is what we really care about, things like HR and doubles.  Sadly, those are the ones most likely to be influenced by luck.
I’d also propose that it’s important to look at each type of batted ball seperately.  A little while ago, I looked at Kelly Johnson’s season and found that there was very little consistency from year to year when in came to outhitting one’s expected BABIP.  The answer was “not much consistency.”  (I found a four-year intra-class correlation of around .27).  What I didn’t know then was that there are different effects for different types of batted balls.  I looked at how well players did in “outhitting” their expected BABIP, chopped up by each different type of batted ball.  For example, 24% of grounders go for hits, while 73% of line drives do.  So, we would expect players to have a .730 BABIP on liners and a .240 BABIP on grounders.  Of course, things vary, but do they vary consistently?  If a player is above average in year one, he should be above average in year two.  That’s the mark of a skill-based stat. 
The answer depends on the type of batted ball.  Players were more consistent in ”outhitting” their expected BABIP on flyballs (ICC’s over 4 years were in the mid-.30s, depending on what PA inclusion criteria was used) than grounders (ICC in the mid-.20′s), and line drives (about .10).  When I split up flyballs into infield flies (which have an expected BABIP of about .025) vs. non-infield flies.  In fact, getting more hits on popups was almost entirely luck (.02 ICC over four years).
It makes sense that there would be more of a “skill” in out-hitting expectations on flyballs.  Some players are rather skilled at hitting them off the wall, and some are not.  The skill in out-hitting expectations on ground balls is called “speed.”  Line drives on the other hand are just a matter of luck as to whether someone catches them or not.  A high line drive will likely go off the wall, but that’s about it.  If a popup goes for a hit, either someone missed it, or the batter simply lucked into a Texas Leaguer.  So, when looking at whether a player will continue with getting all those hits, it’s important to know what type of hits he’s getting and what the base rate expectations are for that type of ball  So, if you see someone who hits a lot of line drives have a dip in his performance (or a breakout year), expect a lot of regression to the mean.  If he’s the kind of guy who hits a lot of flyballs, he’s not going to have to give as much of that back in regression to the mean.
So, yes Virginia, it is possible to sort out which stats are the result of luck and which are the result of skill for batters too in a fairly coherent way.  There is variation in how reliable each stat is, but in general, the farther away the ball gets from the bat, the more luck creeps in to influence the outcome.

Creating a dynamic FIP with BaseRuns

If you’re interested in starting a fistfight at the next SABR convention (not that I’m advising this) simply start bringing up DIPS in casual conversation loudly enough and I’m sure you can get something going. Voros McCracken set up the sabermetric version of the “less filling, tastes great” argument when he wrote:

There is little if any difference among major-league pitchers in their ability to prevent hits on balls hit in the field of play.

Suffice it to say that not everyone agrees with this.

But what everyone does agree on is that pitchers have far less control over the outcome of a ball in play than they do over the so-called Three True Outcomes: the walk, the strikeout and the home run.

From this, McCracken constructed dERA, essentially a run estimation model that attempts to isolate a pitcher’s performance from that of his defense.

For those looking for a quick-and-dirty shortcut for dERA, Tom Tango’s FIP is generally relied upon:

(13*HR+3*BB-2*K)/IP+3.2

3.2 is the league factor that puts FIP on the same scale as ERA.

FIP is also often used as a sort of component ERA, to estimate a player’s ERA from his projected component stats. There is, of course, Bill James’ Component ERA for those purposes as well. (Confoundingly enough, Component ERA is traditionally abbreviated ERC. Since "Earned Runs Created" describes what ERC is and does perfectly, that’s what I tell myself ERC stands for.)

So I decided to run a comparison of some of these run estimators.

Read more of this post

Recapping the BIP

Before even getting into the meat of this article, no, the title does not refer to Bip Roberts… so I’ll understand if hardcore fans of his are now turned off.  What the title does refer to, however, is balls in play and how they pertain to the statistics BABIP, FIP, and ERA.  I have written a lot here and on my other stomping grounds of late about how some of these statistics are affected and, seeing as it is a holiday weekend with not much interweb usage, it seemed like the logical time to recap everything into one neat package.  For starters, what are these three statistics?
BABIP: Batting Average on Balls In Play is a statistical spawn of the DIPS theory discovered by Voros McCracken at the turn of the century.  Essentially Voros found that pitchers have next to no control over balls put in play against them, which is why certain pitchers would surrender a ton of hits one year and much less the next.  From a control standpoint, the goal of the pitcher would be to get an out.  Once a ball is put in play, unless it is hit right back to the pitcher many defensive aspects have to coincide for an out to result.  Take a groundball for instance, one between shortstop and third base: both fielders have to understand whose territory the ball occupies and that fielder has to have the proper range in order to field it, all in a very short amount of time. 
There are plenty of other variables as well but what should be clear is that the pitcher has no control over them.  He may have control over sustaining a certain percentage of balls in play each year but the hits that result are almost entirely out of his hand.  In fact, the only aspects of pitching over which he has any type of control are walks, strikeouts, and home runs allowed.  Everything else is dependant on the fielding and luck.
BABIP is calculated by dividing the Hits minus Home Runs by the Plate Appearances excluding Home Runs, Walks, Strikeouts, and Sacrifice Flies.  If Player A has 30 hits out of 90 at-bats he will post a .333 batting average.  But if 8 of those 30 hits are home runs and 8 of the outs are strikeouts, in BABIP terms he would be 22 for 74, or .297.  This explains that, of all balls put in play–any hit or batted out other than a home run–29.7% fell in for hits.
FIP: a creation of Tom Tango’s, Fielding Independent Pitching takes the three controllable skills of walks, strikeouts, and home runs allowed, properly weights them, and then scales the result similar to the familiar ERA.  The end result explains what a pitcher’s skillset suggests his ERA should be around.  Someone with an ERA much lower than their FIP is usually considered to be lucky while the inverse is also true.  The statistic is kept at Fangraphs and ERA-FIP was recently added as well in order to allow readers a glimpse at those under- or overperforming their controllable skills.
ERA: arguably the most popular pitching barometer, ERA can be calculated by multiplying the earned runs of a pitcher by nine and dividing that product by the total number of innings pitched.  While not a terrible stat it suffers from some pretty drastic noise.  For starters, what are earned runs?  The surname ‘earned’ implies there are other runs that can be given up and that these must satisfy a specific criteria.  For instance, if a fielder botches a routine play with two outs, and the pitcher then gives up seven runs, none will be earned because the inning was extended by the poor play of the fielder.  This gets into all sorts of questions regarding exactly what an error is and how that factors into a pitcher’s performance.
Earned runs are also a direct result of hits, which have been proven to be largely accrued through chance via the DIPS theory.  So, if pitchers cannot control the percentage of hits they give up on balls in play, then fluctuations in hits can either inflate or deflate an ERA regardless of the pitcher’s skill level.  Therefore the FIP is more indicative of performance level because it only measures the three aspects of pitching he has control over which should not suffer from much fluctuation at all, as Pizza Cutter showed not too long ago that these skills were some of the quickest to stabilize.
Controlling BABIP
At Fangraphs we occasionally call upon a statistic we titled xBABIP, which refers to what the BABIP of a pitcher can be expected to be given his percentage of line drives.  Dave Studeman found a few years back that the general range of BABIP could be predicted with very good accuracy by adding .12 to the LD%; if a pitcher surrendered 22.1% line drives his xBABIP would be ~.341.  Using this for predictive purposes would not be correct due to the fact that the general baseline for pitchers is .300.  What we can do is evaluate performance at a given time and attribute line drives to a rather high or low BABIP.  For instance, saying that Player B’s BABIP of .275 as of today primarily due to his ultra-low 14-15% LD rate would be correct; saying that it will continue like this would not.  The line drive percentage may change as the season goes on.  In summation, we can use something like this when evaluating the past for pitchers but not the future.
David Appelman showed not too long ago that, in 2007, 15% of flyballs fell in for hits, 24% of grounders turned into hits, and a whopping 73% of line drives also followed suit.  Due to this, the ideal xBABIP calculation would be .15(FB) + .24(GB) + .73(LD).
I have done studies here recently, and Jonathan Hale at Baseball Digest Daily has done others in the past as well, that show how aspects like velocity, movement, and location can all affect the BABIP of a given pitcher.  It also been shown, again by Studeman, that elite relievers have the ability to consistently post lower BABIPs than others.  More studies have shown that pitchers, if any, have very weak control over their BABIP but instead of deeming it control I would be more inclined to say that these pitchers are merely taking advantage of “cold spots.” 
If just 15% of flyballs result in hits and such a large number of line drives do, then we could intuitively expect someone with consistently low LD rates and higher FB rates to post lower BABIPs.  From a movement perspective, I found that those with above average vertical movement in different horizontal movement subgroupings post lower BABIPs as well.  Higher vertical movement usually correlates to flyballs, and voila, flyballs have the lowest percentage of hits.
This was just a recap of the three statistics and explanations pertaining to their usage.  Based on this, if we see someone like Carlos Zambrano, whose ERA consistently beats his FIP, based on consistently posting lower BABIPs, we could somewhat safely assume that he might not be controlling anything persay but rather taking advantage of all the aspects proven to result in lower BABIPs.  His controllable skills may not be as good as his ERA would suggest but movement, velocity, and location may have combined to greatly aid his efforts.

Pitcher fatigue, batted balls, and DIPS

Are you tired, Mr. Starter?  I’ve been asking this question of my magic spreadsheet for a while now, and last week, during my look at fatigue factors like pitch count, days of rest, mileage on the arm for the season, and number of times through the lineup, I promised a follow up study on how fatigue affected what happens to a batted ball.  Here it is.
I isolated all batted balls from 2000-2006 (isolated is a strong word: that’s still 500K+ events!)  Like in my last two articles, I first calculated the batter’s and pitcher’s GB rate, LD rate, and FB rate (including pop ups), and then the expected probabilities of each for such a matchup, using the odds ratio method.  Then I took the natural log of that number.  I entered this into a binary logit regression as a control for batter and pitcher tendencies, and then on top of that entered my fatigue variables.  Pitch count was present in all the regressions, then one of the other fatigue variables (rest, mileage, times through order, number of pitches thrown last time out), and then the interaction of pitch count and whatever fatigue measure was under consideration.  (In my day job, we call this a modeator analysis.)
First, what effect does pitch count have on the batted ball profile in general.  As the game wears on, there’s an effect for pitch count such that grounders go down and fly balls go up.  How poetic.  Line drives don’t seem to respond to pitch count.
Let’s talk about mileage.  Like last week, I kept the sample to guys who went a maximum of 10 days between starts on the season to get rid of players who had “hidden mileage” in AAA or the bullpen, plus those who had been injured.  The results: as the season wears on, and the pitcher has more pitches on that arm, there’s a very real effect.  When the ball comes off the bat, it’s more likely to be a flyball and less likely to be a grounder.  Of course, the effect is not going to overwhelm a pitcher’s own tendencies to induce ground balls, but it’s certainly not going to help.  Tired pitchers get the ball up in the zone more often and that’s more likely to be hit in the air.  Now, last week, I didn’t find that there was a significant increase in home runs (or anything other than walks), in terms of outcomes, but batted ball percentages are usually much more statistically stable than are outcome measures.  It’s possible that what used to be ground ball outs are becoming fly ball outs, but I’m not convinced.
Next, let’s talk about rest.  Turns out that days of rest didn’t have any effect whatsoever on the batted ball profile, once you control for batter and pitcher matchup.  In my previous study from last week, I found that a short-rested pitcher was more likely to give up a home run.  So, while he might not throw any more fly ball pitches, his pitches that do go for fly balls are more likely to leave the yard.  A ground ball pitcher likely wouldn’t have the same problem, because… well a mistake on a ground ball pitch is going to just be a slightly harder hit ground ball… maybe a better chance for a single.  This also brings up the old chestnut about starting sinkerballer pitchers on short rest because their ball is “heavier” and sinks better, leading to more ground balls.  I decided that was worth a look.  I restricted my sample to pitchers who had GB percentages over 50% for the year.  That’s not an exact proxy for sinkerballers, but I’m guessing there’s a few sinkers and splitters being thrown there.  The result: no effect.  Looks like the sinker is sinking any more heavily because of the short rest, but it doesn’t seem to harm the pitcher either.
Finally, let’s talk about the number of times that the pitcher has seen this guy before in this game.  Because I’m controlling for the expectations of this pitcher/batter matchup and for pitch count, any effects for time through the lineup are likely due to the pitcher actually gathering intelligence about the batter.  Are there effects?  Yes, there are.  As the lineup cycles around more, the batter is more likely to hit a ground ball and less likely to hit a line drive.  (That’s a good trade for the pitcher!)  So, it looks as though the pitcher is actually gathering some sort of intelligence on the batter and is perhaps gaining some small advantage.  Oftentimes, Sabermetric analysis has a tendency to reduce at bats to simple agglomorations of probabilities.  Here’s some evidence that we need to take a look at the mental aspect of the game.  Of course the batter and pitcher are trying to learn about one another.  It looks like the pitcher is the one who has the advantage.  Perhaps a pitcher gets the batter out with his “stuff” early and his brain late.
One more area of interest.  Does fatigue affect DIPS?  For a long time, it’s been assumed that balls in play went for hits at a rate that had more to do with the defense than the pitcher.  That’s been based mostly on season-to-season intercorrelations.  But, what about within a game?  The answer is… yes, there is an effect.  At lower pitch counts, a ball in play is less likely to be a hit, again, controlling for batter/pitcher rates.  Additionally, there’s an effect for number of times through the lineup (already controlling for the fact that there will be a pitch count effect.)  So, we would expect that starters who are efficient with their pitch count to have a lower BABIP overall.  Fresher pitchers throw pitches that are better able to be turned into outs.  This might explain my question concerning Troy Percival.
There are a few more factors that could be studied.  I didn’t consider age (younger pitchers probably bounce back faster) nor body type (through BMI?), and I haven’t yet looked at relievers.  And then there’s the work that’s sure to come from the Pitch F/X folks who can break this stuff down on a molecular level.

Winning with an 89-mph fastball: an analysis of Brian Bannister (Part 3)

In Part 1 of this series, we examined Brian Bannister’s suggestions for why he has been able to beat the league BABIP. He indicated that it was probably due to pitching more often in favorable pitcher’s counts and inducing balls in play with two strikes, when the hitter is against the ropes. However, the evidence didn’t show much advantage for Bannister. We noted that he did pitch a little more often in favorable counts, but this led to him avoiding walks more than anything; it had little salutary effect on his BABIP.
In Part 2 of this series, we learned about the pitches that Bannister threw during 2007 and how he used them. We saw that the fastball and curveball were good pitches against right-handed hitters, and the slider was a good pitch against left-handed hitters.
Part 1
Part 2
Part 3
In this final part of the series, we’re going to marry those two approaches to see if we can uncover any patterns that might explain Bannister’s BABIP performance. In this portion, I’m not concentrating so much on evaluating Bannister’s own statements, as I did on Part 1. Rather, I’m thinking more about what we can expect from Bannister in the future. I’m also interested in investigating techniques that could prove useful for evaluating DIPS theory on a component basis as we accumulate more PITCHf/x data in the coming seasons.
Should we expect Bannister to maintain any of his BABIP edge and thus his 3.87 ERA from 2007? Or are the projection systems like PECOTA (subscribers only) and CHONE more reasonable when they project an ERA of 5.19 or 4.74?
Read more of this post

Winning with an 89-mph fastball: an analysis of Brian Bannister (Part 2)

In Part 1 of this analysis, we examined the league numbers for batting average on balls in play (BABIP) and whether Bannister was able to beat the league BABIP by pitching in favorable counts. We found that he did not gain any particular advantage by inducing more balls in play on two-strike counts, so we turn elsewhere to seek an explanation for his 2007 performance.
Part 1
Part 2
Part 3
What pitches does Brian Bannister throw? The scouting reports tell an interesting tale, especially if you follow them back a couple years. In the minor leagues, the cut fastball was reputed to be his best pitch. His four-seam fastball was thrown in the high 80′s, touching 90, although he was able to locate it well, his curveball was a big breaker that was considered a plus pitch, his changeup was a work in progress, and his slider was regarded as a pitch likely to be scrapped. But in the fall of 2006 in the Mexican League, Bannister worked on a two-seam fastball, and after joining the Royals in trade for Ambiorix Burgos, he scrapped his cutter, experimented with different speeds on his curveball, and started throwing a slider again.
What can we see in the PITCHf/x data regarding his pitch repertoire in 2007?
Read more of this post

Winning with an 89-mph fastball: an analysis of Brian Bannister (Part 1)

I’ll warn you from the start that the title is a tad ambitious. I don’t know exactly how Brian Bannister wins in the major leagues with a below-average fastball speed, but I hope to share some of what I have learned on the topic. This article will take the form of a three-part series.
Part 1
Part 2
Part 3
In case you’ve been hiding under the proverbial sabermetric rock the last few weeks–maybe you’re one of those weirdos who believe players are human or you’ve been out of your garage recently to look at the sky–Brian Bannister gave a fascinating three-part interview to Tim Dierkes at MLB Trade Rumors last month.
In Part 3 of the interview, Bannister talked about his opponents’ batting average on balls in play (BABIP).

I think a lot of fans underestimate how much time I spend working with statistics to improve my performance on the field. For those that don’t know, the typical BABIP for starting pitchers in Major League Baseball is around .300 give or take a few points. The common (and valid) argument is that over the course of a pitcher’s career, he can not control his BABIP from year-to-year (because it is random), but over a period of time it will settle into the median range of roughly .300 (the peak of the bell curve). Therefore, pitchers that have a BABIP of under .300 are due to regress in subsequent years and pitchers with a BABIP above .300 should see some improvement (assuming they are a Major League Average pitcher).
Because I don’t have enough of a sample size yet (service time), I don’t claim to be able to beat the .300 average year in and year out at the Major League level. However, I also don’t feel that every pitcher is hopelessly bound to that .300 number for his career if he takes some steps to improve his odds – which is what pitching is all about.

In the interview, Bannister postulated a reason for his success on BABIP.

So, to finally answer the question about BABIP, if we look at the numbers above, how can a Major League pitcher try and beat the .300 BABIP average? By pitching in 0-2, 1-2, & 2-2 counts more often than the historical averages of pitchers in the Major Leagues. Until a pitcher reaches two strikes, he has no historical statistical advantage over the hitter. In fact, my batting averages against in 0-1, 1-0, & 1-1 counts are .297/.295/.311 respectively, very close to the roughly .300 average.
My explanation for why I have beat the average so far is that in my career I have been able to get a Major League hitter to put the ball in play in a 1-2 or 0-2 count 155 times, and in a 2-0 or 2-1 count 78 times. That’s twice as often in my favor, & I’ll take those odds.

This interview has gotten a lot of buzz in sabermetric cyberspace. Several people have taken a look at BABIP at different ball-strike counts, including my colleague at StatSpeak, Pizza Cutter. There seems to be some ability for the pitcher to control the count on which hitters put balls into play, but it looks like a fairly small effect on average. (Pizza, correct me if I’m summarizing your conclusions incorrectly.)
Bannister also mentioned to Dierkes that getting two strikes on the hitter gives him the strategic advantage in terms of pitch selection.

It is obvious that hitters, even at the Major League level, do not perform as well when the count is in the pitcher’s favor, and vice-versa. This is because with two strikes, a hitter HAS to swing at a pitch in the strike zone or he is out, and he must also make a split-second decision on whether a borderline pitch is a strike or not, reducing his ability to put a good swing on the ball. What this does is take away a hitter’s choice. If I throw a curveball with two strikes, the hitter has to swing if the pitch is in the strike zone, whether he is good at hitting a curveball or not. He also does not have a choice on location. We are all familiar with Ted Williams‘ famous strike zone averages at the Baseball Hall of Fame. It is well-known that a pitch knee-high on the outside corner will not have the same batting average or OBP/SLG/OPS as one waist-high right down the middle. Here is a comparison of the batting averages and slugging percentage on my fastball vs. my curveball:
Fastball: .246/.404
Curveball: .184/.265

We do know from John Walsh’s work something about batting average and slugging percentage against the typical major-league fastball (.330/.521) and curveball (.310/.471). If Bannister is correct in his numbers, he’s doing quite a bit better than the league with both the fastball and curveball. But is Bannister correct in the numbers he quotes and assertions he makes?
So far, most people are accepting what Bannister said at face value. Let’s take a closer look and see if we should believe his numbers and conclusions. We’ll draw on two data sets from the 2007 season. One is the standard pitch-by-pitch result data for all of Bannister’s 2603 pitches in 2007. With this data set we can examine results on balls in play and how Bannister performed in various ball-strike counts. The second data set is the detailed PITCHf/x trajectory data recorded for 1304 of Bannister’s pitches, or about half of his starts. With this data set we can identify pitch types and reliable strike zone location information in order to gain a greater understanding of Bannister’s pitching strategies.
Read more of this post

Breakdown of balls in play by count

Recent evidence may suggest otherwise, but I am still a contributor to Statistically Speaking. I’ve been working on an analysis that has been more difficult to bring to fruition than I expected; that, along with “real life” getting in the way more of late, is what has severely cut into my posting frequency.
However, in the process of number crunching for the analysis I’m doing, I came across some statistics that I haven’t seen posted publicly anywhere, not even in the Baseball-Reference splits. (Some of it is in the B-R splits, but not most of it.) Maybe I’ve just missed them, in which case drop me a line and let me know where else you found them. I thought these might be interesting to a few other people, so I’ll share them. Mostly, I’m just putting the numbers up here for the rest of you to enjoy, but I’ll also make a few comments on some trends that stuck out to me.
I’m looking at pitch data broken down by ball-strike count. I’m using the MLB Gameday 2007 data as my source. Today I present the breakdown of types of balls put into play by the hitter.

Ball Strike Total Pitches Total Safe Total Out Single Double Triple Home Run Field Error Other Safe
0 0 22029 0.341 0.659 0.214 0.069 0.007 0.039 0.012 0.001
0 1 17222 0.329 0.671 0.222 0.062 0.005 0.027 0.012 0.001
0 2 7878 0.319 0.681 0.228 0.049 0.005 0.022 0.013 0.001
1 0 14030 0.344 0.656 0.212 0.070 0.007 0.044 0.010 0.001
1 1 16576 0.334 0.666 0.214 0.066 0.006 0.034 0.012 0.001
1 2 14626 0.326 0.674 0.220 0.059 0.006 0.025 0.014 0.001
2 0 5015 0.355 0.645 0.202 0.077 0.007 0.056 0.012 0.000
2 1 10308 0.349 0.651 0.212 0.074 0.007 0.041 0.014 0.001
2 2 14861 0.330 0.670 0.215 0.062 0.009 0.030 0.012 0.001
3 0 251 0.402 0.598 0.167 0.120 0.008 0.092 0.012 0.004
3 1 4393 0.376 0.624 0.214 0.083 0.009 0.056 0.013 0.001
3 2 11019 0.351 0.649 0.216 0.070 0.007 0.045 0.012 0.001
- total 138208 0.338 0.662 0.216 0.066 0.007 0.036 0.012 0.001
Ball Strike Ground Out Fly Out Pop Out Line Out Force Out Ground into DP
0 0 0.208 0.195 0.073 0.043 0.036 0.034
0 1 0.270 0.183 0.067 0.047 0.034 0.034
0 2 0.291 0.181 0.070 0.047 0.039 0.033
1 0 0.225 0.206 0.078 0.048 0.031 0.032
1 1 0.267 0.194 0.070 0.046 0.031 0.030
1 2 0.293 0.181 0.076 0.047 0.033 0.028
2 0 0.218 0.217 0.077 0.051 0.028 0.027
2 1 0.254 0.198 0.075 0.049 0.026 0.025
2 2 0.278 0.194 0.076 0.051 0.031 0.025
3 0 0.171 0.219 0.096 0.040 0.024 0.020
3 1 0.213 0.213 0.081 0.049 0.023 0.021
3 2 0.264 0.212 0.080 0.055 0.009 0.012
- total 0.254 0.195 0.074 0.048 0.030 0.029
Ball Strike Sac Bunt Sac Fly Double Play Bunt Ground Out Field. Ch. Out Bunt Pop Out Other Out
0 0 0.033 0.014 0.004 0.010 0.002 0.005 0.001
0 1 0.015 0.010 0.004 0.004 0.002 0.002 0.000
0 2 0.004 0.010 0.003 0.000 0.002 0.001 0.001
1 0 0.014 0.011 0.004 0.002 0.002 0.001 0.000
1 1 0.010 0.008 0.003 0.003 0.002 0.001 0.000
1 2 0.002 0.007 0.003 0.000 0.002 0.000 0.000
2 0 0.008 0.013 0.005 0.000 0.002 0.000 0.000
2 1 0.005 0.010 0.003 0.002 0.002 0.000 0.000
2 2 0.001 0.009 0.003 0.000 0.002 0.000 0.000
3 0 0.000 0.024 0.000 0.000 0.004 0.000 0.000
3 1 0.004 0.012 0.004 0.001 0.003 0.000 0.000
3 2 0.001 0.009 0.005 0.000 0.001 0.000 0.000
- total 0.011 0.010 0.004 0.003 0.002 0.001 0.000

Ball in Play Safe Percentage vs Count
A hitter reaches base safely more often on balls in play when the count is in his favor. Don’t change the channel, the revelations like that just keep on coming at StatSpeak, and you don’t want to miss one!
Okay. My first slightly less than completely and utterly obvious observation is that the home run rate is strongly tied to the count.
Ball in Play Home Run Percentage vs Count
The doubles rate shows the same effect, but smaller, as does the triples rate to some extent. The singles rate stays pretty flat with respect to count, although there is a bit of an inverse effect–in better hitter’s counts, the hitter gets more extra base hits and slightly fewer singles.I haven’t looked at the type of batted ball (fly ball, line drive, ground ball, bunt, etc.) that results in hits. That’s a bit more difficult to parse out of the Gameday data. Since it doesn’t have its own field, getting that information requires some regular expression matching on the text description of the play. That’s fairly straightforward but nonetheless a nontrivial bit of coding that makes it a project for some point in the future rather than part of this data set for me.
Ball in Play Groundout-Flyout Ratio vs Count
Another thing I noticed was that there were more groundouts and less flyouts the more strikes and less balls there were in the count. As pitchers gain the upper hand, they tend to get more groundball outs. I didn’t include popups and line drives in the accompanying chart since they didn’t show a strong tendency relative to count.
I saw a couple other things that are obvious once you think about them, but it was interesting to me to see them reflected in the data. The first was that force outs, GIDPs, and fielder’s choice outs all go down dramatically with a 3-2 count, dropping from 6.4% to 2.3% of balls in play. Presumably this is because the runners are often going with the pitch on 3-2.
The second thing that interested me was the favorite counts for hitters to bunt for an out. (Bunting for a hit is not included for the reason mentioned previously.)

Count Bunt Outs
0-0 0.043
0-1 0.019
0-2 0.004
1-0 0.016
1-1 0.013
1-2 0.002
2-0 0.008
2-1 0.006
2-2 0.001
3-0 0.000
3-1 0.005
3-2 0.001

If I don’t get around to presenting my full analysis in a timely fashion, I’ll see if I can present a few more statistical tidbits like this along the way.

Follow

Get every new post delivered to your Inbox.