Shooting the Gap

For some reason, I was recently thinking about doubles and the types of players that are getting the two-baggers. Why? No idea, but it did bring about some confirmation of my ideas.

First off, there are two types of doubles hitters. There is the power hitter who narrowly misses a dinger and has to settle for a double, and there is the line drive hitter who knows how to hit the gaps and get to second. But which one is better at hitting doubles? Let’s go to the data.
First, here is a table of the top 10 doubles hitters this year.

The one thing to take note of is the line drive percentage. They don’t hit a huge amount of fly balls, but instead drive the ball allowing hitters with less power to take the extra base. Next, let’s move on to the top 10 home run hitters this year.
Notice the difference in LD and FB%? Also, you can see that the power hitters are pretty solid doubles hitters, due to home runs falling short.
Is this of any real significance? Not really, but it is always interesting to look at the data and confirm your thoughts on a subject.

Recapping the BIP

Before even getting into the meat of this article, no, the title does not refer to Bip Roberts… so I’ll understand if hardcore fans of his are now turned off.  What the title does refer to, however, is balls in play and how they pertain to the statistics BABIP, FIP, and ERA.  I have written a lot here and on my other stomping grounds of late about how some of these statistics are affected and, seeing as it is a holiday weekend with not much interweb usage, it seemed like the logical time to recap everything into one neat package.  For starters, what are these three statistics?
BABIP: Batting Average on Balls In Play is a statistical spawn of the DIPS theory discovered by Voros McCracken at the turn of the century.  Essentially Voros found that pitchers have next to no control over balls put in play against them, which is why certain pitchers would surrender a ton of hits one year and much less the next.  From a control standpoint, the goal of the pitcher would be to get an out.  Once a ball is put in play, unless it is hit right back to the pitcher many defensive aspects have to coincide for an out to result.  Take a groundball for instance, one between shortstop and third base: both fielders have to understand whose territory the ball occupies and that fielder has to have the proper range in order to field it, all in a very short amount of time. 
There are plenty of other variables as well but what should be clear is that the pitcher has no control over them.  He may have control over sustaining a certain percentage of balls in play each year but the hits that result are almost entirely out of his hand.  In fact, the only aspects of pitching over which he has any type of control are walks, strikeouts, and home runs allowed.  Everything else is dependant on the fielding and luck.
BABIP is calculated by dividing the Hits minus Home Runs by the Plate Appearances excluding Home Runs, Walks, Strikeouts, and Sacrifice Flies.  If Player A has 30 hits out of 90 at-bats he will post a .333 batting average.  But if 8 of those 30 hits are home runs and 8 of the outs are strikeouts, in BABIP terms he would be 22 for 74, or .297.  This explains that, of all balls put in play–any hit or batted out other than a home run–29.7% fell in for hits.
FIP: a creation of Tom Tango’s, Fielding Independent Pitching takes the three controllable skills of walks, strikeouts, and home runs allowed, properly weights them, and then scales the result similar to the familiar ERA.  The end result explains what a pitcher’s skillset suggests his ERA should be around.  Someone with an ERA much lower than their FIP is usually considered to be lucky while the inverse is also true.  The statistic is kept at Fangraphs and ERA-FIP was recently added as well in order to allow readers a glimpse at those under- or overperforming their controllable skills.
ERA: arguably the most popular pitching barometer, ERA can be calculated by multiplying the earned runs of a pitcher by nine and dividing that product by the total number of innings pitched.  While not a terrible stat it suffers from some pretty drastic noise.  For starters, what are earned runs?  The surname ‘earned’ implies there are other runs that can be given up and that these must satisfy a specific criteria.  For instance, if a fielder botches a routine play with two outs, and the pitcher then gives up seven runs, none will be earned because the inning was extended by the poor play of the fielder.  This gets into all sorts of questions regarding exactly what an error is and how that factors into a pitcher’s performance.
Earned runs are also a direct result of hits, which have been proven to be largely accrued through chance via the DIPS theory.  So, if pitchers cannot control the percentage of hits they give up on balls in play, then fluctuations in hits can either inflate or deflate an ERA regardless of the pitcher’s skill level.  Therefore the FIP is more indicative of performance level because it only measures the three aspects of pitching he has control over which should not suffer from much fluctuation at all, as Pizza Cutter showed not too long ago that these skills were some of the quickest to stabilize.
Controlling BABIP
At Fangraphs we occasionally call upon a statistic we titled xBABIP, which refers to what the BABIP of a pitcher can be expected to be given his percentage of line drives.  Dave Studeman found a few years back that the general range of BABIP could be predicted with very good accuracy by adding .12 to the LD%; if a pitcher surrendered 22.1% line drives his xBABIP would be ~.341.  Using this for predictive purposes would not be correct due to the fact that the general baseline for pitchers is .300.  What we can do is evaluate performance at a given time and attribute line drives to a rather high or low BABIP.  For instance, saying that Player B’s BABIP of .275 as of today primarily due to his ultra-low 14-15% LD rate would be correct; saying that it will continue like this would not.  The line drive percentage may change as the season goes on.  In summation, we can use something like this when evaluating the past for pitchers but not the future.
David Appelman showed not too long ago that, in 2007, 15% of flyballs fell in for hits, 24% of grounders turned into hits, and a whopping 73% of line drives also followed suit.  Due to this, the ideal xBABIP calculation would be .15(FB) + .24(GB) + .73(LD).
I have done studies here recently, and Jonathan Hale at Baseball Digest Daily has done others in the past as well, that show how aspects like velocity, movement, and location can all affect the BABIP of a given pitcher.  It also been shown, again by Studeman, that elite relievers have the ability to consistently post lower BABIPs than others.  More studies have shown that pitchers, if any, have very weak control over their BABIP but instead of deeming it control I would be more inclined to say that these pitchers are merely taking advantage of “cold spots.” 
If just 15% of flyballs result in hits and such a large number of line drives do, then we could intuitively expect someone with consistently low LD rates and higher FB rates to post lower BABIPs.  From a movement perspective, I found that those with above average vertical movement in different horizontal movement subgroupings post lower BABIPs as well.  Higher vertical movement usually correlates to flyballs, and voila, flyballs have the lowest percentage of hits.
This was just a recap of the three statistics and explanations pertaining to their usage.  Based on this, if we see someone like Carlos Zambrano, whose ERA consistently beats his FIP, based on consistently posting lower BABIPs, we could somewhat safely assume that he might not be controlling anything persay but rather taking advantage of all the aspects proven to result in lower BABIPs.  His controllable skills may not be as good as his ERA would suggest but movement, velocity, and location may have combined to greatly aid his efforts.

The foul ball, part one: What does it tell us about a batter?

No one likes foul balls.  They don’t accomplish anything, and the two strike variety in particular actually does nothing at all to move the game along.  In fact, it used to be that the foul ball was a non-pitch, no matter how many strikes were on the batter.   Really, the only good that a foul ball does is give some kid a souvenir that he’ll treasure forever.  (Admit it, if you’ve caught one or even gotten close to one, you can tell me the date, opponent, score, who hit it.  Even if you’re 40, it was a meaningless game, and Steve Lombardozzi hit it.)
But what of the foul ball?  Everyone hits them.  Some hit more than others.  But can they actually tell us anything about a batter?  Surprisingly, yes.  So, as we begin our look into the foul ball, let’s create a few metrics.  First off, Retrosheet has data on the fact that a foul ball was hit, although doesn’t tell us exactly how foul the ball was.  For example, was it just poked to the first base coach, tipped at the plate, or a monster shot down the left field line that just… hooked… foul?  That limitation aside, we can still create some simple metrics.

  • Foul balls per plate appearance
  • Percentage of total pitches fouled off
  • Percentage of pitches with which the hitter made contact that went foul (foul contact)
  • Overall swing percentage and overall contact rate

Additionally, there are two “types” of foul balls.  There are the foul balls committed when there are 0 or 1 strikes (which count as a strike) and those that come with 2 strikes (which don’t).  We know that with two strikes, a batter will often go into “protect” mode and swing at borderline pitches, figuring that if he swings and fouls them off, it’s not the end of the world.  So, we will split these two types of foul balls apart, and create two metrics.  One is for 0-1 strike foul balls per plate appearance.  The other is for 2 strike foul balls per plate appearance in which the batter actually had two strikes on him.
First off, let’s see if fouling pitches off is a repeatable skill.  For example, we know that some players are pretty consistent home run hitters, but are there foul ball hitters?  I subjected all of the above new metrics to an intra-class correlation (a measure of how consistent players are across years… think of it as a year-to-year correlation but with the ability to incorporate multiple years of data), using four years worth of Retrosheet data (2004-2007).  Results were pretty encouraging.  With a minimum of 250 total PA for the season in question, foul balls per PA checked in with the lowest intra-class correlation of .574.  All of the other stats reached into the mid- .60 range or better.
Now, while that’s nice to know that players are generally consistent in how often they generate foul balls, do those foul balls actually tell us anything useful.  I looked at a bunch of batting statistics for some answers.  I looked at usual “slash” stats (AVG/OBP/SLG), along with the batter’s batted ball profile, walk rate, strikeout rate, single rate, double-and-triple rate, and HR rate.  I ran a gigantic correlation matrix to see what turned up.  The first thing to note is that just about everything was statistically significantly correlated with one another.  I took all players from 2000-2007 with a minimum of 250 PA and ended up with a sample of 2400+ player-seasons.  At that kind of sample size, it’s all significant, so our analysis will deal more in the strength of the correlation.
What’s interesting is that 0 and 1 strike foul balls per PA had a correlation with two strike foul balls in two strike PA’s of .106, which is rather low.  This says that they are two relatively independent “skills.”  Knowing about a player’s general foul ball count isn’t enough.  You have to differentiate between the two.  There’s other evidence that we are dealing with two different skills with two different types of etiology.  Hiding in the correlations between the swinging metrics that I created, there was an interesting pattern to be found.  Foul contact percentage  was correlated with 0 and 1 strike foul ball rate at .487.  The correlation with two strike fouls was a mere .150.  Looks like 0 and 1 strike foul balls are more the result of a player who can’t straighten out his swing.  Then, there’s the issue of overall contact percentage.  The correlation between that and two strike fouls is .524 while the correlation with 0 and 1 strike foul balls is -.366 (note that’s a negative).  So, a player who makes a lot of contact is likely to have a lot of two strike pitches that he spoils, but fewer foul balls for strike one and strike two.
Do foul balls correlate with any of the actual outcome stats?  Well, the usual slash stats didn’t correlate well with any of these new metrics.  But, some specific outcomes show some rather intriguing patterns.  A batter who hits a lot of two-strike foul balls is less likely to strike out (r = -.482) and less likely to walk (r = -.345).  Makes sense, since he is more likely to extend his at-bats until (assuming he actually doesn’t end up walking or striking out) he puts the ball in play.  And put the ball in play he usually does.  Two strike foul balls are moderately associated with an upswing in singles rate (r = .347), but a downturn in HR rate (r = -.215) and HR/FB (r = -.300).  This pattern becomes even more pronounced when one looks at overall contact percentage (which we’ve already seen is a pretty good correlate of two-strike foul ball hitting).  The correlation with strike outs hits -.875, which makes sense because you can’t strike out if you hit the ball, foul tip into the catcher’s glove notwithstanding.  Overall contact is correlated with more singles (r = .549) and fewer HR (r = -.521).
What about zero and one strike foul balls?  The correlations with the outcome measures aren’t very strong.  However, foul contact percentage predicts the opposite pattern of overall contact.  Strikeouts go up (r = .669), singles go down (r = -.454), and homeruns go up (r = .410). 
What’s funny is that if you just look at foul balls per PA, the correlations are not really that interesting.  Most of them are below .20, which isn’t much of anything.  A lot of the effects seem to wash out when you look at all foul balls together.  You really have to break them down into their component parts before you can fully understand what’s going on.  Foul balls early in the count speak of a player who doesn’t make a lot contact, when he does make contact he’s not likely to hit it fair, who strikes out a lot, but when he hits the ball, it’s more likely to go out of the ballpark.  There was one other thing that jumped out.  Foul contact percentage was (moderately) correlated with a lower ground ball percentage (r = -.318) and a higher fly ball percentage (r = .297).  So, we have guys who appear to be trying for fly balls, and fly balls that will leave the park at that.  That’s a higher risk swing, and more likely to go awry, either by swinging and missing or swinging and having the ball go foul.  Two strike foul balls speak of a hitter who makes good contact, keeps at bats alive, but is generally just a singles hitter.  Low risk, low reward.
So if you want know what’s going on with your favorite player, the one who seems to be acting a little weird lately and all you have is a box score, take a look at his foul balls.  They might provide you with a useful little diagnostic of whether he’s feeling a little risky or if he’s playing it safe lately.  I suppose there could be the case where a hitter is high on both types of foul balls (or low on both), and the effects would seem to cancel each other out.  (Remember, total fouls per PA aren’t really correlated well with anything.)  But, if you see a lot of one type and not a lot of another, you can perhaps come to some conclusions about what’s going on in the batter’s head.

Do hitters get more jumpy during a slump?

One of the criticisms thrown at Sabermetricians is that we are analysts who do not appreciate the richness of the psychology that goes into the game and its players.  And for what it’s worth, the charge isn’t completely without merit.  Many of our models look at baseball as simple agglomerations of probabilities without any sense of what’s going on inside the players’ heads.  The place where this particular argument has gotten the most play is in the clutch hitting debate.  After all, say the doubters, some people have a psychological ability to perform in the clutch while others freak out.  And they’re actually right, at least generally.  The problem is that over the course of a baseball season, the actual effect of this “clutch” ability is fairly minimal.  Whether Bill James’s fog is to blame or whether it’s Mike Stadler’s, author of The Psychology of Baseball, theory that while this ability exists in the general public, baseball players make it to the Majors in part because they all have this particular “clutch” ability, clutch hitting ability has consistently shown itself to be a (very) minor player in explaining the variance in actual outcomes.
But, clutch hitting isn’t the only place where a player’s mental state might affect an individual at-bat and make him something of a different man from one at-bat to another.  Consider the slumping batter.  He’s had a bad couple of days (weeks?) and he just can’t seem to get a hit.  He might be feeling a little desperate.  Will he ever get on base again?  Perhaps he should swing a little bit more or a little bit harder to try to break out?  Sportscasters like to call this “pressing.”  But does it really happen?
In The Book — Playing the Percentages in Baseball (which if you haven’t read, you are a horrible human being), StatSpeak friends Tangotiger, MGL, and Andrew Dolphin laid out the case pretty convincingly that as far as the actual outcome of the at-bat, a hot streak or in a slump has very little predictive power over what will happen next.  You’re better off betting that a player will do what he normally does over the course of the season.  But, that doesn’t mean that our esrtwhile batter gets to that outcome in the same way as usual.  A slumping player’s outcome may be the same as might be expected were he not slumping, but he may go up to the plate with the mindset that he needs to do something different in this plate appearance.  Perhaps he might take a few more chances on some pitches and swing a little bit more.  It makes sense that he might try this strategy.  Let’s look at the data.
First, let’s define a slump.  I took the 2006 season and eliminated all the pitchers batting.  I then set up to look at each plate appearance and the ten that came immediately before it.  A player was in a slump if during the last ten plate appearances he had made an out in at least nine of them.  That’s a really rough definition of “slump”, but it keeps things manageable. 
Now, how to tell if a player is pressing or not.  At first, I looked at pitches per at bat.  Do players who are in slumps have shorter at bats (they poke at the first ball near the strike zone) than when they aren’t slumping?  The answer is no.  I took everyone in baseball with at least 50 PA in 2006 and calculated their average pitches per PA when they were slumping and when they were not.  Then, I ran a paired samples t-test to see whether there was a significant difference between the two groups.  A paired-samples t-test has the advantage of comparing people to themselves, so that there’s not the confound that batters who have longer at bats might be better hitters overall and thus less likely to go into slumps.  Players saw an average of 3.70 pitches when not slumping and 3.69 when in a slump.  There’s a problem though: number of pitches doesn’t tell you what the player did on those pitches.  For example, a player could take two balls and a strike, then put the ball in play (1 swing in 4 pitches), or swing at four pitches (and foul one off) for a strike out.
I needed a better idea for how to measure a player’s willingness to swing at the plate.  His jumpiness factor, if you will.  Thankfully, a brilliant researcher named Russell A. Carleton published a paper (pdf warning) last year in SABR’s Statistical Analysis newsletter, By The Numbers.  In it, he uses signal detection theory to measure whether a player’s willingness to swing, correcting for the fact that some players don’t see a lot of swing worthy pitches, comparatively.  A strike may be a strike, but whether it was a called strike or a swinging strike tells us something about a player’s attitude toward swinging.  He called the stat “response bias.” 
So, I calculated the response bias for all players, both when they were in a slump and when they weren’t and compared the two, again with a paired-samples t-test.  Most players in baseball have a response bias around 1.0, which is ideal.  Greater than 1.0 means that they swing too much, less than 1.0 means they don’t swing enough, but a higher number means a greater likelihood of swinging.  Players when they were not in a slump had an average response bias of .965.  When slumping, it jumped to .990.  That difference was significant.  There’s no units to put on those numbers, so you can’t interpret them as .990 somethings, only that it indicates a little bit more of a willingness to swing.  The effect isn’t huge.  Players don’t turn into Vlad Guerrero-like free swingers when in a slump (Vlad was overall a 1.671 in 2006), but they do seem to go up to the plate with a little more urgency.  A little.
I wanted to rule out one possible alternate hypothesis.  Perhaps players who like to swing a lot compensate for a slump by swinging more, but those who are more reserved about their swings actually go to the plate even more reluctant to swing.  I split the group into halves and looked only at those who were above 1.0 in response bias (the free-swingers) when not slumping and then those below 1.0 (the takers) when not slumping.  Both groups increased their overall response bias when in a slump.  Looks like everyone gets a little jumpy from time to time.
So sure, psychology is in play in baseball.  How could it not be?  Players are human beings.  Now, are the effects on actual behavior and outcomes that big?  No.
This is what happens when people practice psychology without a license.  People assume that most (other) people crack under pressure and thus, are unable to come through in the clutch.  In fact, when there’s an actual emergency situation (and here I’m talking about something actually important, where people might get hurt), most people report that while they felt a little afraid, they were able to put it aside and do what had to be done.  Are there people who freeze?  Sure, but they are actually fairly rare breeds.  And I agree with Mike Stadler’s explanation that none of them make it to the Majors. 
Why then do we believe that there’s this amazing performance-to-mental toughness link in baseball?  Because we get most of our information from media members who get paid to lay on the drama and romance.  Dramatic sells, and “mental toughness” is a wonderfully romantic concept, because anyone can be “mentally tough.”  Not everyone can hit a 95 mph fastball 400 feet, no matter what his mental state at the time.  But where’s the fun in that explanation?

Breakdown of balls in play by count

Recent evidence may suggest otherwise, but I am still a contributor to Statistically Speaking. I’ve been working on an analysis that has been more difficult to bring to fruition than I expected; that, along with “real life” getting in the way more of late, is what has severely cut into my posting frequency.
However, in the process of number crunching for the analysis I’m doing, I came across some statistics that I haven’t seen posted publicly anywhere, not even in the Baseball-Reference splits. (Some of it is in the B-R splits, but not most of it.) Maybe I’ve just missed them, in which case drop me a line and let me know where else you found them. I thought these might be interesting to a few other people, so I’ll share them. Mostly, I’m just putting the numbers up here for the rest of you to enjoy, but I’ll also make a few comments on some trends that stuck out to me.
I’m looking at pitch data broken down by ball-strike count. I’m using the MLB Gameday 2007 data as my source. Today I present the breakdown of types of balls put into play by the hitter.

Ball Strike Total Pitches Total Safe Total Out Single Double Triple Home Run Field Error Other Safe
0 0 22029 0.341 0.659 0.214 0.069 0.007 0.039 0.012 0.001
0 1 17222 0.329 0.671 0.222 0.062 0.005 0.027 0.012 0.001
0 2 7878 0.319 0.681 0.228 0.049 0.005 0.022 0.013 0.001
1 0 14030 0.344 0.656 0.212 0.070 0.007 0.044 0.010 0.001
1 1 16576 0.334 0.666 0.214 0.066 0.006 0.034 0.012 0.001
1 2 14626 0.326 0.674 0.220 0.059 0.006 0.025 0.014 0.001
2 0 5015 0.355 0.645 0.202 0.077 0.007 0.056 0.012 0.000
2 1 10308 0.349 0.651 0.212 0.074 0.007 0.041 0.014 0.001
2 2 14861 0.330 0.670 0.215 0.062 0.009 0.030 0.012 0.001
3 0 251 0.402 0.598 0.167 0.120 0.008 0.092 0.012 0.004
3 1 4393 0.376 0.624 0.214 0.083 0.009 0.056 0.013 0.001
3 2 11019 0.351 0.649 0.216 0.070 0.007 0.045 0.012 0.001
- total 138208 0.338 0.662 0.216 0.066 0.007 0.036 0.012 0.001
Ball Strike Ground Out Fly Out Pop Out Line Out Force Out Ground into DP
0 0 0.208 0.195 0.073 0.043 0.036 0.034
0 1 0.270 0.183 0.067 0.047 0.034 0.034
0 2 0.291 0.181 0.070 0.047 0.039 0.033
1 0 0.225 0.206 0.078 0.048 0.031 0.032
1 1 0.267 0.194 0.070 0.046 0.031 0.030
1 2 0.293 0.181 0.076 0.047 0.033 0.028
2 0 0.218 0.217 0.077 0.051 0.028 0.027
2 1 0.254 0.198 0.075 0.049 0.026 0.025
2 2 0.278 0.194 0.076 0.051 0.031 0.025
3 0 0.171 0.219 0.096 0.040 0.024 0.020
3 1 0.213 0.213 0.081 0.049 0.023 0.021
3 2 0.264 0.212 0.080 0.055 0.009 0.012
- total 0.254 0.195 0.074 0.048 0.030 0.029
Ball Strike Sac Bunt Sac Fly Double Play Bunt Ground Out Field. Ch. Out Bunt Pop Out Other Out
0 0 0.033 0.014 0.004 0.010 0.002 0.005 0.001
0 1 0.015 0.010 0.004 0.004 0.002 0.002 0.000
0 2 0.004 0.010 0.003 0.000 0.002 0.001 0.001
1 0 0.014 0.011 0.004 0.002 0.002 0.001 0.000
1 1 0.010 0.008 0.003 0.003 0.002 0.001 0.000
1 2 0.002 0.007 0.003 0.000 0.002 0.000 0.000
2 0 0.008 0.013 0.005 0.000 0.002 0.000 0.000
2 1 0.005 0.010 0.003 0.002 0.002 0.000 0.000
2 2 0.001 0.009 0.003 0.000 0.002 0.000 0.000
3 0 0.000 0.024 0.000 0.000 0.004 0.000 0.000
3 1 0.004 0.012 0.004 0.001 0.003 0.000 0.000
3 2 0.001 0.009 0.005 0.000 0.001 0.000 0.000
- total 0.011 0.010 0.004 0.003 0.002 0.001 0.000

Ball in Play Safe Percentage vs Count
A hitter reaches base safely more often on balls in play when the count is in his favor. Don’t change the channel, the revelations like that just keep on coming at StatSpeak, and you don’t want to miss one!
Okay. My first slightly less than completely and utterly obvious observation is that the home run rate is strongly tied to the count.
Ball in Play Home Run Percentage vs Count
The doubles rate shows the same effect, but smaller, as does the triples rate to some extent. The singles rate stays pretty flat with respect to count, although there is a bit of an inverse effect–in better hitter’s counts, the hitter gets more extra base hits and slightly fewer singles.I haven’t looked at the type of batted ball (fly ball, line drive, ground ball, bunt, etc.) that results in hits. That’s a bit more difficult to parse out of the Gameday data. Since it doesn’t have its own field, getting that information requires some regular expression matching on the text description of the play. That’s fairly straightforward but nonetheless a nontrivial bit of coding that makes it a project for some point in the future rather than part of this data set for me.
Ball in Play Groundout-Flyout Ratio vs Count
Another thing I noticed was that there were more groundouts and less flyouts the more strikes and less balls there were in the count. As pitchers gain the upper hand, they tend to get more groundball outs. I didn’t include popups and line drives in the accompanying chart since they didn’t show a strong tendency relative to count.
I saw a couple other things that are obvious once you think about them, but it was interesting to me to see them reflected in the data. The first was that force outs, GIDPs, and fielder’s choice outs all go down dramatically with a 3-2 count, dropping from 6.4% to 2.3% of balls in play. Presumably this is because the runners are often going with the pitch on 3-2.
The second thing that interested me was the favorite counts for hitters to bunt for an out. (Bunting for a hit is not included for the reason mentioned previously.)

Count Bunt Outs
0-0 0.043
0-1 0.019
0-2 0.004
1-0 0.016
1-1 0.013
1-2 0.002
2-0 0.008
2-1 0.006
2-2 0.001
3-0 0.000
3-1 0.005
3-2 0.001

If I don’t get around to presenting my full analysis in a timely fashion, I’ll see if I can present a few more statistical tidbits like this along the way.

Who gets the credit/blame for that home run?

Do hitters hit home runs, or do pitchers give them up?  Of course, the answer to that question runs both ways, but who is more to blame/credit?  Pitchers occasionally throw such beautifully tantalizing hanging curveballs that even Rafael Belliard hit the occasional home run and some hitters are so strong that they can punish the even the best-placed pitch.  But it brings up an interesting question.  Who is more in control of how far the batter hits the ball?  After all, a home run is simply a fly ball that went a long way and crossed over a fence.
Here’s how I (sorta) answered the question:  From 1993-1998, Retrosheet’s data files contain pretty good data on hit locations, primarily because those years were compiled by Project Scoresheet and licensed to Retrosheet.  Recent Retrosheet files are much more scant in this data.   The way that Project Scoresheet made notations on the data was through the use of a standardized rough map of zones on a ball diamond.  It’s rather rough-grained, but it takes us from being able to say that Jones flew out to center field to saying that Jones flew to shallow (or deep) center.  Once we know where a fly ball went (and I selected out all balls from 1993-1998 which Retrosheet said were either pop ups or fly balls), in terms of what zone, we can get a decent appoximation of how far away that is from home plate. 
I assumed that all balls attributed to a zone were hit to the exact center of that zone.  Of course, that’s not true, but it’s close enough for government work (some were hit a little beyond, some a little in front… it evens out).  Since the Project Scoresheet grid is meant to scale to the outfield dimensions of a park, we need to know the outfield dimensions of the park in use.  (The infield dimensions of all parks are set by the official rule book).  If one knows a little bit about trigonometry, it’s easy enough to get a decent guess of where the was hit to, if it was on the field of play.  For home runs, I gave the hitter 105% of the wall measurement over which it crossed.  (So, a HR hit to a 360 foot power alley was estimated at 378 feet.)  105% was nothing more than my guess.
I totalled up the mean estimated distance for all fly balls and pop ups hit in a season by each batter, and then turned around and sorted it by pitcher.  I selected out only those with 25 fly balls in the season in question that they either hit or had hit off of them.  I subjected them to an AR(1) intra-class correlation to look at the year-to-year correlations over the six years in the data set to see if the mean distance was more consistent for pitchers or for hitters.
ICC for pitchers = .312
ICC for batters = .612
Batters are fairly consistent from year-to-year in how far their average fly ball travels.  Pitchers are less so, but still have some level of consistency from year to year.  It seems that both share some blame/credit for the distance on a flyball.  This might explain why batters seasonal rates of HR/FB were more stable than pitcher rates.  For those unfamiliar with this methodology, you can interpret those numbers in much the same way as a year-to-year correlation coefficients (although this method is better, as it allows for multiple data points.)  There are some batters who are powerful (i.e., they hit the ball a long way) and some who are not, and that power level is pretty consistent from year to year.  Pitchers who give up fly balls (and all of them, save Fausto Carmona, occasionally give up a fly ball) do have some (not a lot, but it’s there) repeatable skill in whether they tend to give up short fly balls or long fly balls.  For those GMs nervous about signing that fly ball pitcher because he might give up a bunch of home runs, you can check his average fly ball distance (and perhaps his standard deviation), perhaps look at it by field, and plug in a few numbers to at least give you a little better projection for how many HR he might give up next year, although the error of prediction is still likely to be rather high.
Let’s play around with this a bit more from the batter’s perspective.  I looked at the average distances for balls hit to the batter’s pull field, opposite field, and center field.  I upped the inclusion criteria to 50 FB in the season in question.  Again, I looked at ICC over the six seasons in the data set.  (Anything in the grid with an “8” in it was “center field”, so that includes the power alleys.) 
ICC for pull field = .239
ICC for center field = .591
ICC for opposite field = .359
Batters are much more consistent in how far they hit the ball to center field (and the power alleys), and are actually more consistent in how they hit the ball to the opposite field than to their pull field.  So if you want to get a good idea of how a player will hit for power, take a look at what he does gap to gap.  That’s going to be the most consistent measure.

The 2007 All “Paid By My Former Employer” Team

When a baseball team decides to release a player, one of two things can happen.  Either A – the team and the player will agree to a contract buyout, or B – the team will rid itself of the player but continue to pay him for the remainder of his contract.  Nowadays, though, baseball contracts have gotten so large that teams will try to find trades for these players to avoid both of these scenarios.
This way they can offer some money to the team taking the player on and save themselves from having to pay an unwanted player with very wanted money.  Due to the financial structures of certain teams, and the insane contracts some players receive (in 2008 Alex Rodriguez will make more than the entire Florida Marlins team), finding these trades can prove rather difficult.
The perfect example of this is the situation of aforementioned Alex Rodriguez.  After the 2000 season, A-Rod signed a 10-yr/252 mil deal with the Rangers.  After three seasons with Texas, A-Rod was traded to the New York Yankees.  He still had 7-yr/179 mil remaining on the deal, and in order to complete the trade, the Rangers had to pick up 67 mil out of that 179 mil remaining (an average of 9.5 mil per season over the remaining 7 seasons).
From 2004-2007, the four years after the trade that put Rodriguez on the Yankees, here are his numbers against the Rangers – the team still paying him significant money each year.

  • 2004: (8 mil from Texas) 6 g, 7-20, .350, 5 runs, 1 HR, 2 rbi
  • 2005: (12 mil from Texas) 10 g, 13-41, .317, 12 runs, 2 HR, 8 rbi
  • 2006: (10 mil from Texas) 10 g, 11-38, .289, 7 runs, 2 HR, 6 rbi
  • 2007: (7 mil from Texas) 9 g, 12-35, .343, 4 runs, 2 HR, 3 rbi
  • Totals: (37 mil from Texas) 35 g, 43-134, .321, 28 runs, 7 HR, 19 rbi

In those 35 games, the Yankees have won 25 of 35 games against the Rangers.  Essentially, for the last four years, the Rangers have paid A-Rod 37 million dollars to crush them.  Luckily for them, by A-Rod opting out of his Yankees contract (and signing another record-breaking deal), the Rangers do not have to pay the remaining 32 mil that they owed him.
Upon seeing his numbers against the team still paying him, I was so fascinated by the logic of it all that I decided to find more players in similar situations.  I decided to compile a team of players who were paid significant salaries (1 mil or higher) in 2007 by teams they were either released from, traded from, or simply had not been on in quite some time.
The team will not include A-Rod, as we already mentioned him, and it will not include everyone who fell into this category in 2007, as I only wanted to discuss the situations that merited discussion.  For instance, the Phillies paid Jim Thome 7 mil in 2007, but he does not belong here.  Thome did not play against the Phillies and they got more production out of Ryan Howard anyway. 
Paying Thome the 7 mil allowed them to have a better, cheaper, and younger player).  So, without any more delay, I present to you – The 2007 All “Paid By My Former Employer” Team.
A former All-Star, Kendall signed a huge contract (6-yr/60 mil in 2002) to stay in Pittsburgh and saw any power productivity drop off the charts.  Prior to signing that contract, Kendall averaged 9.1 HR/season, as opposed to the 2.8 HR/season in the years afterwards.  When the Athletics dangled Arthur Rhodes and Mark Redman, the Pirates could not resist (who else could?), and sent Kendall to Camp Billy Beane.  Since his contract had been so large, the Pirates had to pay portions of it until it expired at the end of 2007.  Kendall had not been on the Pirates for almost three years and yet still made more money than half of the team.
This past season, the Pirates paid Kendall 5.5 mil to play for the Athletics.  Then, the Athletics traded him to the Cubs.  The A’s paid 4.5 mil more.  In 2007, Kendall made 5.5 mil from Pirates and 4.5 mil from the Athletics to post extremely average numbers in 80 games for the A’s (.226, 2 HR, 22 rbi).  The Cubs only paid him the remaining 1.5-3 mil (made 13 mil in 2007) and got a starting catcher who hit .270, with a .362 OBP in 57 games, en route to a division win.
To top it off, Kendall played the Pirates as a member of the Cubs, going 3-8 with 2 rbi in a Cubs series win against the team paying him 5.5 mil.
An 11-year veteran, Mueller played third base for the Giants, Cubs, and Red Sox before joining the Dodgers in 2006.  Prior to that season he signed a 2-yr/9 mil deal.  The former batting champion played only 32 games for the Dodgers in 2006, hitting .252 with 3 HR and 15 rbi.  Due to injury problems Mueller was forced to retire.  The Dodgers decided to honor his contract and give him a Special Assistant to the GM position. 
In 2007, Mueller made 4.5 million dollars to do whatever it is a SATTGM does, making him one of the highest paid executives.
After the 2004 season, Delgado became a free agent and signed with the Marlins.  He turned down a lucrative deal with the Mets because they apparently tried to rely too heavily on his Hispanic background.  One year later, Delgado was traded to the Mets, with money, for Mike Jacobs and two prospects.  The money that went along with Delgado involved 2 mil of his 2007 salary. 
That 2 mil made Delgado the 3rd highest paid Marlin in 2007, behind Miguel Cabrera and Dontrelle Willis..
Not only that, but Delgado’s Mets went 11-5 against his former employer in 2007, partly thanks to his 21 hits, .327 avg, 5 HR, and 18 rbi.  The 3rd highest paid Marlin in 2007 was a Mets player who helped beat the Marlins 11 times.  Interesting.
Former All-Star Shea Hillenbrand has been shipped around since his departure from the Red Sox.  In 2007, he went from the Angels to the Padres to the Dodgers, and made 6.5 mil.
Starting with the Angels, Hillenbrand hit .254 with 3 HR and 22 rbi, in 53 games.  While on the Angels, he helped beat the Dodgers in interleague play, going 3-7 with 3 rbi in 2 games.  He was then bought out by the Angels and signed by the Padres.  During his time in San Diego, Hillenbrand got no action in 12 days, before being released.  The Dodgers picked him up and he played 20 games, hitting .243, with 1 HR and 9 rbi.
Against the Padres, he went 1-7 with a run, in 2 Dodger losses.  I guess the Padres knew all along that getting rid of Hillenbrand would help them beat the Dodgers! 
Hillenbrand was paid 6 mil to play one-third of the Angels’ season, posting below average numbers, and then 0.5 mil by the Padres and Dodgers to play 20 games and hit .243.  
After many successful seasons with the Blue Jays and Dodgers, Shawn Green signed a deal with the Diamondbacks.  They gave him a large contract and then shipped him to the Mets in August 2006.  They agreed to pay 5.8 mil of Green’s 2007 contract as he played a full season for the Mets.
In 130 games, Green hit .291 with 10 HR, 46 rbi, 30 2B and 62 runs.  Against the D-Backs, Green went 4-13 with 1 HR and 3 rbi in a 3-1 Mets series win (he was injured for the other series).  The D-Backs paid Green 5.8 million dollars to help beat them three times.
Ortiz was once a hot commodity.  After a 2002 world series appearance with the Giants, he won 36 games in 2 seasons with the Braves.  Following that success, the D-Backs signed him to a 4-yr/33 mil deal, after the 2004 season.  In 2005, Ortiz went 5-11 with a 6.89 ERA, and in 2006 he went 0-5 with an 8.14 ERA in 6 starts before the D-Backs had enough.
With nearly 22 mil remaining on the contract, GM Josh Byrnes released Ortiz, agreeing to pay the remainder of his contract as long as he was no longer in uniform.
In 2007, the D-Backs paid Ortiz 7.1 mil, even though he did not play for them.  Meanwhile, Brandon Webb made only 4.5 mil.  Ortiz barely played in 2007, going 2-3 with a 5.51 ERA for the Giants.
But, as baseball irony would have it, one of his 2 wins came in a 7-inning, 2-run performance against the D-Backs  – his only quality start came in a game where he beat the team paying him over 7 million dollars.
After several good and under the radar seasons in Montreal, Vazquez was traded the Yankees before the 2004 season.  One year later, Vazquez and cash were sent to the D-Backs for Randy Johnson.
Why are all of these guys somehow tied to the D-Backs!?
In 2007, for the White Sox, Vazquez went 15-8 with a 3.74 ERA on a terrible team.  The Yankees paid 3 mil for him to do that.
As great as Randy Johnson once was, some diligence may have literally paid off for the Yankees in this instance.
Why anyone pays this guy is something I will never understand.  You’ll never guess where this tale starts though… the D-Backs of course.
In 2006, Julio went from the Orioles to the Mets to the D-Backs.  Before the 2007 season began, the D-Backs sent Julio to the Marlins.  In order to complete the trade, the D-Backs agreed to pay 1 mil of his 3.6 mil salary.  On the Marlins, Julio was typical Jorge Julio.  In 10 games, he went 9.1 innings, giving up 18 hits, 13 runs, and 11 walks, en route to a 12.54 ERA.
The Marlins sent him to the Rockies and he had a great year for them.  In 58 games, Julio went 52.2 innings, surrendering only 50 hits and striking out 50.  He even posted a 3.93 ERA in Colorado of all places.
In 3 games vs. the D-Backs, Julio went 3.1 innings, giving up 4 hits and striking out 5.  Against the Marlins, Julio helped the Rockies win 2 of 3, striking out 4 in 1.2 innings.  Essentially, the D-Backs paid Julio 1 mil to greatly help the team that eventually swept them out of the playoffs, even though Julio was not on the post-season roster.
There are more examples of players like this from 2007 and there will no-doubtedly be more in 2008.  I feel that information like this can be used, in part, to evaluate the moves of General Managers.  It was extremely interesting that the D-Backs were involved in 5 of the 8 cases mentioned.
I did not even realize that until getting to Jorge Julio while actually writing this article.
Overall, in 2007, the D-Backs paid a combined 13.9 million dollars to players no longer on their team, who actually contributed to some of their losses.
In the case of Russ Ortiz, he shut the D-Backs down for 7 innings, resulting in a loss.  In Julio’s case, he helped the Rockies beat the D-Backs in a game by pitching 1.1 scoreless innings of relief.  And, in Green’s case, he accounted for 5 of the 23 runs scored by the Mets in their 3-1 series win against the D-Backs.
Ultimately, the D-Backs made the playoffs, but a team that almost lost the division during the final five games of the season paid 13.9 mil to players who helped beat them 5 times. 
I’m not saying these players single-handedly beat the D-Backs but they were paid employees who certainly did not help them win.

The 2007 All "Paid By My Former Employer" Team

When a baseball team decides to release a player, one of two things can happen.

I thought we were all professionals here

This is the story of my search for professional hitters.  You know the type.  He’s a .260-.270 (read: average) hitter, but a “guy who does the little things with the bat that don’t show up in the box score.”  He makes “productive outs” (there’s an oxymoron!)  He’s “good with the bat.”  He “moves runners along.” (Run along now children!)  I bet he has a great personality too
Despite the fact that the point of the game (at least on offense) is not to make outs, some hitters apparently make outs like nervous eighth graders at their first co-ed party… but that’s OK, because they make the good kind of outs.  I suppose that in the game of “let’s make the best of a bad situation,” there are some plays that produce outs that are preferable to others.  A sacrifice fly is an out, but it does score a run.  A grounder to the right side with a runner at second and less than two outs is better than a grounder to the left side, because the grounder to the right side probably advances the runner.  But, are there really guys who excel in making “productive outs”?
Well, let’s look at situations in which a batter has a chance to make a “productive out.”  There have to be less than two outs, because a batter who makes an out when there are two outs… do I really need to explain what happens next?  Also, there need to be runners on base.  A batter who makes an out with no one on base has just gotten himself out, and there’s really nothing else left to happen.  He also needs to, ummm… make an out.  But not just any out.  Productive outs are usually attributed to some sort of ability to place the ball on the field.  So, let’s look at all at-bats where the batter made his out on a ball in play (i.e., not a strikeout).  I found all the situations from 2003-2006 that met these criteria.
I figured out how much win probability each of these events added (actually, it’s usually more like subtracted).  Now, when dealing with WPA, we need to remember the WPA is affected by leverage in a given situation, so I divided the WPA for each event by the leverage of that event.  This gives us context-neutral wins added.  I then found the mean context neutral wins added (within this subset of plate appearances), for a batted ball of the type hit to the fielder who fielded it.  (e.g., the average ground ball to the third baseman usually added -.03 context neutral wins).  Then, I looked at whether the batter outperformed (by being less of a drag on his team’s chances than expected) or underperformed (perhaps by hitting into a double play rather than just a fielder’s choice) this expectation.  Sum up a player’s total and see what happens.
(If there’s a StatSpeak drinking game… and there should be… two shots should be required every time I do an intra-class correlation.  Pour yourself a double.)  Over the four years in the dataset, among those batters with at least 25 at-bats in the season under consideration, there was an ICC of .16 for the total sum of WPA over expectation in making these outs.  I divided by number of plate appearances and got an ICC of .14.
To put that in some perspective, I did a similar examination of clutch hitting and found an ICC of .074.  An intra-class correlation (a measure of year-to-year consistency over multiple years for the uninitiated) of .16 is about 5 times as strong (using r-squared), but that’s not saying much.  That’s around the range of year-to-year consistency of BABIP for pitchers.  So, professional hitting seems kinda like clutch hitting.  There are certainly clutch hits, in the same way that there are professional hits.  There are guys who in one year might have several professional hits to their names.  It’s just that year-to-year, it’s not consistent, which we would expect if it was an inherent skill.  “Professional hitter” is a nice thing that people say about average hitters whom they like for some reason.  It’s based mostly on a few isolated incidents that someone remembered, but something that doesn’t shake out when you look at all the data.
But, for what it’s worth, in 2006, the league leaders in context neutral wins above expectation on outs hit in play per relevant plate appearance (or, if you prefer it Baseball Prospectus style, CNWAEOHIP/RPA)… just call it the professional hitting index, were:

  1. Jody Gathright – who actually fell just short of a full win above expectation in this stat
  2. Jeremy Hermedia
  3. Dave Ross
  4. So Taguchi
  5. Craig Wilson

The most un-professional hitters were

  1. Chris Duncan
  2. Tim Salmon
  3. Jason Kubel
  4. Gary Sheffield
  5. Willy Aybar

UPDATE: Tango Tiger requested that I put up the entire list.  The 2006 list is available here in Excel format.  Players are listed by their Retrosheet ID, and they’re sorted by CNWAEOHIP/RPA.  Enjoy.  (The reason it’s the 2006 list and not the 2007 list is that the 2007 Retrosheet event file is still being compiled.  When that comes out, I’ll check out who did what in 2007 using some of my home-cooked stats.)


Get every new post delivered to your Inbox.