TotalZone, a new defensive measure

This weekend, I did something I’ve wanted to do for a while now. I dusted off my copy of Baseball Hacks, learned some basic Perl programming, downloaded Retrosheet’s bevent program, emailed or a little help (thanks Joe) and built myself a play by play database.
Its a wonderful thing. I started playing around with it, and thought I’d see how far I can get constructing my own play by play defensive measure. For years 2003 through 2006, retrosheet has, for virtually every batted ball, a code for type. It will tell you if a ball was a grounder, line drive, flyball, or popup. There are also codes for who fielded the ball, and a spot for hit location using project scoresheet codes, but there isn’t much data there, so I had to do with out.
Here’s what I did: I charged all hits to a specific fielder, combined that with his plays made and errors, and came up with a zone rating. There are no areas outside a fielder’s zone. Infielders are charged when they make an error or field an infield hit. Outfielders are charged when they field any line drive or flyball hit or error. For infielders, I only look at ground balls. Ground ball singles to left are counted 1/2 towards third base, and 1/2 to short. CF singles are charged to 2b and short, and RF to 2B and 1B. In addition, groundball extrabase hits are charged 100% to 3rd base (if LF) or 1B (if RF).
I was surprised to find the results look very reasonable. In most cases they stack up well to more detailed play by play measures. It doesn’t capture as detailed data as zone rating, but is less subject to scorer differences and counts all balls in play, which is the type of measure that will reward fielders with great range.
All players are compared to league average, and for outfielders I was able to get park factors, since I can easily look at home and road numbers. Also, line drives and flyballs were looked at separately instead of being lumped together (something that BIS zone rating does). I aggregate the plus/minus rating for OF, but an OF is not penalized for having an unusual mixture of line drives or flyballs.
I’ll give a summary of some of the players who generate the most discussion. I probably have some revision work to do, such as regressing the park factors and who knows what else. Read more to see how individuals rate: Adam Everett rates #1 in every system, and does again here with a whopping +38. To my recall, that’s right about where UZR has him. Everett was +11, +7, and +2 the previous years.
The anti-Everett, Derek Jeter, was -6, -6, -2, and -17 in 2003. Michael Young was -26 in 2005 but a much improved +7 last year, he also improved greatly in zone rating.
Brandon Inge was +24 last year. Eric Chavez, from 06 to 03, was +13, +15, +20, and +12. In 2005, Chavez allowed 6 doubles down the left field line. A typical 3B allows 25-30. That is the type of detailed stat I can get from this data set. Sure, he’s good, but how does he do it? I’ll try and pick up where The Fielding Bible left off.
Orlando Hudson: +10, +9, +12, +4 Albert Pujols: +11, +8, +14, and +4. The +4 for 2003 seems low, but he played a lot of OF that year, where he was +10. Boy can play some ball.
There are two outfielders who are loved by some pbp methods, not so much by others, but have great reputations: Andruw Jones and Ichiro. Jones does well, +15, +3, +8, +11. So does Ichiro, -3, +14, +13, +8 as a RF, but also a +5 in center last year in limited time.
Griffey was -19 last year after -29 in 2005, and Manny Ramirez finally finds a system that doesn’t hate him. With a big park factor for Fenway, Manny was -7 each of the last 2 years, +2 in 2004, and dead even in 2003. Whatever system the Red Sox use probably gives similar results, otherwise David Ortiz would wear a 1B mitt more often than he does.
The great new CF for the Angels was -7 last year after an average 2005.


14 Responses to TotalZone, a new defensive measure

  1. Matt Souders says:

    I’ve never been terribly impressed with Gary Matthews Jr. to be honest (though the little punk robbed a couple of potentially key hits agains the Ms).
    I have had a PBP database for several months now, but have been holding off on rating fielding because I want to get it “perfect”…it is interesting how well you can do without even doing much in the way of adjustnig for context…LOL

  2. Pizza Cutter says:

    Sean, the 1993-1998 data has much better hit location data. When I do something that requires that, I use those years. The problem is that many of those players don’t play any more.

  3. Sean Smith says:

    So far I’ve only loaded 2000 to 2006. Its quite a massive database. Thanks for the tip, I’ll have fun with 93-98.

  4. JoeArthur says:

    Nice work!
    One way to refine the plus/minus analysis further is by looking at handedness; for the outfield, it seems to be the case that there is a measurable difference in difficulty of catching a line drive depending on whether it is pulled or hit to the opposite field; I never looked at infield grounders to see if there was a similar effect there … But this shouldn’t make a big difference in your results.
    How did your Fenway LF park factor 2003-2006 compare to the career-based factors you derived last year for 1987-2005?

  5. Sean Smith says:

    I haven’t compared apples to apples, but the park factor here was .79 for flyballs and .81 for line drives. Looks like this is a bit stronger park factor than I had before.

  6. tangotiger says:

    I agree that handedness would be a necessity. You could also look at if the DP is in effect, and if the score is close/late (hugging the line). And if the pitcher has FB or GB tendency. Park is huge of course.
    I’ve been waiting for someone to dip their toes in this. Fantastic work!

  7. MGL says:

    Runner on first of course will make a big difference for the 1B.

  8. Pizza Cutter says:

    MGL, in my throwing to first series, I found that while more ground balls get through the infield on the right side when the runner is being held on, the effect is actually cancelled out by the fact that fewer ground balls are hit to the right side in that situation.

  9. Dan Turkenkopf says:

    I’ve had some issues with Retrosheet and line drive data. There seems to be a major discrepancy about how line drives are scored from year-to-year. For a few seasons (2000-02) I only was able to find a single infield line drive for a hit. In 1998, 2860 infield line drives went for hits.
    I’ve been talking to Dial about this and he was going to talk to some of the Retrosheet guys, but I have some concerns about the validity of the data at this point.

  10. Sean Smith says:

    I did this for 2003 to 2006. Batted ball type is missing for most of 2000 to 2002, but is relatively complete for 2003 on, and looks pretty consistent from year to year. For 1993 to 1998 you also have project scoresheet hit locations. 1999 to 2002 seem to be dark years for retrosheet though. Maybe if we all pitch in some donations thay can buy it from STATS.
    I looked at batter handedness for shortstops, and in one sense it makes a huge deal, but in reality very little. With RHB, outs were recorded about 72-73%, compared to about 70% overall. With LHB, its only 65-66%.
    Yet when I reran my ratings, it made at most a 1 or 2 run difference for every shortstop save one. Things seem to even out. The exception was Jack Wilson who went from +3 to -1 thanks to all the Pirate LHP.
    I’ll give runner on first a try.

  11. […] Statistically Speaking | MVN – Most Valuable Network Blog Archive TotalZone, a new defensive measure A new play by play defensive metric. Inge rates well. (tags: defense stats) […]

  12. tangotiger says:

    Sean, I agree that things tend to even out. But, as you point out, in an isolated case, it doesn’t.
    I expect you to see a bigger gap with CF.

  13. Sean Smith says:

    Yes, the outfield gaps are bigger. Seems like 4 run changes are pretty common. For outfielders I’ve added batter hand, and for infielders batter hand and whether 1B is occupied.

  14. […] I have made some progress on Total Zone in the last week. Here’s part 1 in case youmissed it. For outfielders, I already separated line drives from flyballs. To further improve the ratings for outfielders, I added batter hand, as line drives and flyballs to the opposite field are more likely to be turned into outs. I also added a road park factor, which is not a big deal in most cases, but makes a slight difference for someone like Manny Ramirez, who doesn’t have to play in front of a big wall on the road, while other AL leftfielders sometimes do. […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: