TotalZone, a new defensive measure
April 23, 2007 14 Comments
This weekend, I did something I’ve wanted to do for a while now. I dusted off my copy of Baseball Hacks, learned some basic Perl programming, downloaded Retrosheet’s bevent program, emailed or a little help (thanks Joe) and built myself a play by play database.
Its a wonderful thing. I started playing around with it, and thought I’d see how far I can get constructing my own play by play defensive measure. For years 2003 through 2006, retrosheet has, for virtually every batted ball, a code for type. It will tell you if a ball was a grounder, line drive, flyball, or popup. There are also codes for who fielded the ball, and a spot for hit location using project scoresheet codes, but there isn’t much data there, so I had to do with out.
Here’s what I did: I charged all hits to a specific fielder, combined that with his plays made and errors, and came up with a zone rating. There are no areas outside a fielder’s zone. Infielders are charged when they make an error or field an infield hit. Outfielders are charged when they field any line drive or flyball hit or error. For infielders, I only look at ground balls. Ground ball singles to left are counted 1/2 towards third base, and 1/2 to short. CF singles are charged to 2b and short, and RF to 2B and 1B. In addition, groundball extrabase hits are charged 100% to 3rd base (if LF) or 1B (if RF).
I was surprised to find the results look very reasonable. In most cases they stack up well to more detailed play by play measures. It doesn’t capture as detailed data as zone rating, but is less subject to scorer differences and counts all balls in play, which is the type of measure that will reward fielders with great range.
All players are compared to league average, and for outfielders I was able to get park factors, since I can easily look at home and road numbers. Also, line drives and flyballs were looked at separately instead of being lumped together (something that BIS zone rating does). I aggregate the plus/minus rating for OF, but an OF is not penalized for having an unusual mixture of line drives or flyballs.
I’ll give a summary of some of the players who generate the most discussion. I probably have some revision work to do, such as regressing the park factors and who knows what else. Read more to see how individuals rate: Adam Everett rates #1 in every system, and does again here with a whopping +38. To my recall, that’s right about where UZR has him. Everett was +11, +7, and +2 the previous years.
The anti-Everett, Derek Jeter, was -6, -6, -2, and -17 in 2003. Michael Young was -26 in 2005 but a much improved +7 last year, he also improved greatly in zone rating.
Brandon Inge was +24 last year. Eric Chavez, from 06 to 03, was +13, +15, +20, and +12. In 2005, Chavez allowed 6 doubles down the left field line. A typical 3B allows 25-30. That is the type of detailed stat I can get from this data set. Sure, he’s good, but how does he do it? I’ll try and pick up where The Fielding Bible left off.
Orlando Hudson: +10, +9, +12, +4 Albert Pujols: +11, +8, +14, and +4. The +4 for 2003 seems low, but he played a lot of OF that year, where he was +10. Boy can play some ball.
There are two outfielders who are loved by some pbp methods, not so much by others, but have great reputations: Andruw Jones and Ichiro. Jones does well, +15, +3, +8, +11. So does Ichiro, -3, +14, +13, +8 as a RF, but also a +5 in center last year in limited time.
Griffey was -19 last year after -29 in 2005, and Manny Ramirez finally finds a system that doesn’t hate him. With a big park factor for Fenway, Manny was -7 each of the last 2 years, +2 in 2004, and dead even in 2003. Whatever system the Red Sox use probably gives similar results, otherwise David Ortiz would wear a 1B mitt more often than he does.
The great new CF for the Angels was -7 last year after an average 2005.