Playing the blame game with ground ball singles

I’m building something.  It’s something that I’ve been meaning to do for a while, which is a defense rating system.  In fact, I once defined defense as “something that every Sabermetrician has a system for measuring that he is ‘working on.’ “  I guess now I’m a proper Sabermetrician.
I’m not exactly the first person to tackle this one.  There’s the Fielding Bible with its lovely data from Baseball Info Solutions, which I use as my gold standard.  The problem is that those data are proprietary (read: expensive), and I’m a graduate student.  There are a few other systems that have caught my eye.  Shane Jensen and friends developed the Spatial Aggregate Fielding Evaluation (SAFE) system and they got mentioned in a few newspapers, mostly dismissively (for being far too nerdy — because the worst thing you can be in baseball is a nerd – and) for showing (again) that Derek Jeter isn’t a very good shortstop.  There are plenty of others, and listing them turns into a lovely alphabet soup (PMR, ZR, UZR, RZR, FRAA, DER, and of course the greatest fielding stat ever, fielding percentage)
But, the ancestry on my system traces back in part to my former colleague Sean Smith, who about a year ago here on StatSpeak introduced TotalZone (and here’s part 2 and his latest on the subject), which was a system based only on what was available from Retrosheet, where the data are the perfect price for a graduate student: free.  Dan Fox, formerly of Baseball Prospectus, now of the Pittsburgh Pirates, also went about the business of creating a Retrosheet-based system for fielding, which he called simple fielding runs.  But Sean’s gone from StatSpeak and Dan’s gone to that big front office in the sky… er, Pittsburgh.
So, here, I pick up the baton.  I need a Retrosheet compatable system that isn’t just a poor man’s rip off of the other systems.  (On the second, I fear that I shall fail miserably.)  And so I start with the ground ball.  It always ends up in someone’s glove.  Whether that glove is on the hand of an infielder, an outfielder, or the occasional fan is the question.  Usually, it’s a good thing if the man who fields a ground ball is an infielder rather than an outfielder, but who’s to blame if it gets through the infield?  Both Dan’s and Sean’s system assume that if a ground ball goes through to the left fielder, we can blame that half-and-half on the third baseman and the shortstop.  They do similar things for CF (50% the fault of the 2B, 50% the fault of the SS) and RF-bound ground balls.  But, does that stand up to the evidence?  I say no.
The problem, of course, with Retrosheet data is that it doesn’t have hit location data (or at least very much) for recent years, and so anyone wanting to know about fielding in the past few years is reduced to making assumptions like this (or buying the BIS data).  However… there is a little bit of data that can be exploited on Retrosheet.  Because RS bought their 93-98 data from somewhere else (Project Scoresheet?) the 93-98 data have hit locations!  They use the Project Scoresheet location system, which uses a series of vectors to code for where the ball was either fielded or where it went through the infield.  I tossed out all of the balls that didn’t make it to the infield skin.  The infielder will make it to the dribbler and the bunt, no doubt.  Whether or not that will be in time for them to be any use is another issue.  But, can the infielder get to the ball before it gets to the outfield is an important first question because it’s the first step in throwing the batter out.
The careful reader will have noted that I’m not talking about completing plays and making outs, only about getting to the ball.  First off, it plays into my system on a larger scale.  Secondly, I’m reminded of the old adage about why errors are a faulty stat in that an error means that the fielder did something good in at least getting to the ball.  An infield hit is better than an outfield hit, and in order to get an out on a ground ball, an infielder needs to get to the ball.  (Yeah, you see the occasional 9-3 putout… about as often as I see my cousins who live in Phoenix.  Hi, Mike and Steve!)  So, here I’m looking at the Retrosheet data which indicates by whom the ball was fielded.  Whether or not the play was completed is irrelevant… for now.
Here’s what I did.  I took the 1993-1998 data and built a huge data base of ground balls.  I coded for pitcher and batter handedness (it makes a diference!  This had been noted by the ever-reliable John Walsh some time ago.), and, if the ball went to the outfield for a hit, whether or not the hit that resulted was a single or an extra base hit.  Then, I looked at the spread of balls hit to each zone and who was fielding the balls where.  I tossed out all bunts and anything that didn’t at least make it to the infield skin.  I had ten zones to work with, which can be seen here on this diagram.  It’s not quite what the Fielding Bible does (they have 17 zones), but the Retrosheet’s data are free
Let’s look at a ground ball single that gets through to the left fielder in a righty-righty pitcher-batter matchup.  What zone was it usually hit to?  Most often, and fairly obviously, to the hole between short and third (84.1% of the time), a zone marked “56” by Retrosheet.  But, sometimes (7.0%), it went to the zone marked “5” (because that’s where the third baseman is usually standing), and sometimes (6.0%) to “6” and sometimes (2.2%) to “5L” (down the left field line) and sometimes (0.5%) to “6M” (up the middle, to the shortstop side of second base).  There are some weird entries in there that are probably data entry errors (a hit to left field that went through the hole between first and second?) that account for the rest of the numbers (if you add, that’s only 99.8%).  We can re-create the same database for all handedness-type of hit-fielded by combos.  In fact, I did.  Something to note is that right-handed hitters were more likely to pull the ball toward more third base-ward zones (and lefties to shortstop-ward zones).  The effects weren’t huge, but they’re far enough away from 50-50 to be notable.
Now, who’s in charge of each of those zones?  That’s easy enough to figure out.  When the ball is hit to each of the zones, and it doesn’t scoot through, which infielder usually is the one to field it?  Again, looking at our righty-righty matchup, we get the following splits.
Zone   SS got it   3B got it
5L       1.1%         98.6%
5         0.3%         98.8%
56       41.9%       57.4%
6         97.6%       1.2%
6M     88.1%       0.1%  (the second baseman and pitcher pick up the other 11.8%)
Again, note that a right-handed hitter pulled the ball closer to the third baseman (see zone 56).  The pattern was slightly reversed for lefties, although not as extreme.  Now, it’s a matter of simple multiplication to see what share of the blame each of the two fielders should get for a hit to left field.  Since 84.1% of GB singles to left from righty-righty matchups go to zone “56”, and they are 57.4% the responsibility of the third baseman, then he gets 48.2% of the blame for GB singles to left, plus whatever other responsibilites he gets from the other four zones we’re focusing on.  In fact, he ends up with 54.2% of the blame for a single to left, given a righty-righty matchup.  It’s not 50-50, although in fairness to the other systems, it’s close.  When looking at hits to center fielder, the pattern becomes a little more pronounced, with more of a 60-40 split to the shortstop for right-handed batters and to the second baseman for left-handed batters.  50-50 isn’t going to cut it.
For a full breakdown of who’s to blame given some other combos, click here.
In the Retrosheet years where we don’t have hit location data, and all we know is that a GB single went through to the left fielder, we at least now have a better idea of where to place the blame among the infielders.
A few caveats.  One is the obvious fact that I’m going to be using data from 1993-1998 and assuming that it still holds up 10-15 years later.  Indeed, I’ve shown that baseball players are getting bigger and that they are probably getting slower.  This could certainly affect range and I suppose could in turn affect those numbers.  The other is in extending this system.  Dan Fox, when originally developing SFR had in mind a system that could be applied to minor league data.  This system assumes that minor leaguers hit like major leaguers and have similar spray charts.  It may very well be the case, but without hard data, we have no way to know.


3 Responses to Playing the blame game with ground ball singles

  1. joe arthur says:

    Looks like my comment has been temporarily lost to moderation, presumably by a flaw in the way I tried to post links. The links I tried to refer to were for Dan Fox’s and a repetition of your link to Sean Smith’s Harball Times article. Wish the site had a preview mode …

  2. dan says:

    I still like the WOWY approach for fielders with no hit-location data. Someone would have to figure out how much weight each scenario gets (stadium, runner on base, pitcher on mound). And I was also wondering… could WOWY be applied to the minor leagues, or would the constant changing of levels and true talent levels (they’re still changing in the minors) wreak havoc on the system?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: