Defending Manny's Defense

Although he is one of the game’s all-time best hitters, Manny Ramirez has long been maligned as one of the worst outfielders of his time. With increased attention being paid to fielding in the overall valuation of players, Ramirez, and others such as Raul Ibanez, have a couple wins (or $10m) a year lopped off their overall value. In the wake of Ibanez’ newly signed lucrative conract with the World Champion Pjillies, in a recent interview he defended his ability to catch the ball despite metrics which showed otherwise.

I had been concerned about the meaurements of Manny’s abilities, and then Jason Bay was traded to take Ramirez’ place, and he too showed up at the bottom of the defensive metrics. Was playing leftfield in front of Fenway Park’s Green Monster causing distortions? I’ve written about what needs to be recorded to allow us the best data with which to measure defense. If a batted ball hits too high off the wall to be catchable, there should be no responsible fielder. RetroSheet only has one field FLD_ID which lists the retrieving fielder. I’m just not sure how BIS or Stats codes this kind of hit.

The data available being what it is, I looked to see if I could analyze the RetroSheet play by play while controlling for the ballpark. If balls at Fenway are being incorrectly coded, fine, as long as it’s being done impartially for everyone who plays there. My method is a combination of WOWY and OPA! (WOWOOPA?). I grouped all batted balls by PARK_ID, BATTEDBALL_CD (grounder, fly, liner or popup) and FLD_CD (which position 1-9 the ball was hit to), and then summed the resulting SI, DO, TR, HR & ROE. This shows that the BABIP of all flies to LF in all ballparks from 2003-2006 was .171. Then I copied that query and inserted Ballpark_ID. The BABIP for all flies to LF in Fenway from 2003-2006 was .313, which suggests that either Manny was a historically terrible fielder, or a lot of uncatchable balls off the Green Monster were being coded as flies to LF. Lastly, copy that query and insert YEAR and FLD_ID (fielder who retrieved the ball). The BABIP for all flies to LF in Fenway Park with Manny Ramirez retrieving the ball from 2003-2006 was .320. Here’s where WOWY comes in. With Ramirez, 230 of 718 flies were hits, without him total 683 – 230 = 453 hits allowed by others, total 2183 – 718 = 1465 flies to others. 450 of 1465 flies by others, times Ramirez’ 718 flies, gives an expected 222 hits allowed. Manny actually allowed 230, only -8 hits over 6 seasons. Now, repeat this for every year in every ballpark, with each fielder’s stats in bucket one, and the league totals 2003-2006 minus the bucket one totals, ten scaled down to the same number of balls in play as bucket one, in bucket two. The next query sums each bucket, then compares the totals. Here I used linear weights, where FRAA = 0.474*(SIexp-SIobs)+0.764*(DOexp-DOobs)+1.063*(TRexp-TRobs)+1.409*(HRexp-HRobs). Each fielder is compared to each other fielder in the same ballpark, looking seperately at GB, FB, LD & PU. By looking at SI, DO, TR & HR allowed, it not only grades a fielder on his ability to turn a batted ball into an out, but also the ability to keep the batter from stretching the hit for extra bases. It does not, as yet, account for the batter (did the fielder’s team hit harder to catch balls than the opposition) or extra bases gained by existing baserunners. So, I’m (not yet) claiming this as the absolute in fielding metrics, but it’s a start, and I am confident it does the job in accounting for ballparks.

Now, some results –

UZR

BLC

Manny

2003

3.7

-0.4

2004

-2.5

-0.2

2005

-22.6

-12.8

2006

-19.5

-7.6

2007

-18.3

-2.1

2008

-4.8

-1.1

He’s not good, but not as god-awful as UZR had him for 2005-2007. I’d project him at -4 for 2009.

UZR

BLC

Ibanez

2003

-3.3

-3.0

2004

-0.3

-5.1

2005

-1.5

-1.9

2006

-5.6

-13.5

2007

-20.8

-11.8

2008

-12.6

-20.5

He is god-awful, probably at least 4 runs worse than Burrell. 2006 and 2008 both ranked in my worst 25 of the last 6 seasons.

The best of 2008? Jacoby Ellsbury was +25.2 total for all three outfield positions, followed by Carlos Beltran +19.3, Franklin Gutierrez +14.1, Melky Cabrera +14.0, Carl Crawford +12.8, Cody Ross +12.7, David DeJesus +12.5, Denard Span +11.6. Endy Chavez +11.5, and Adam Jones +11.4. Gold Glove recipient was Nate McLouth -4.7. The tottom ten were Brad Hawpe -28.2, Jason Bay -23.8, Jeff Francoeur -23.0, Raul Ibanez -20.5, Carlos Lee -14.2, Hunter Pence -13.3, Pat Burrell -13.1, Ryan Ludwick -11.1, Ken Griffey -11.1 and Ryan Braun -10.5. Nyjer Morgan had the best rate (FRAA/Opp), followed by Jacoby Ellsbury and Endy Chavez, and Morgan and Ellsbury are the top two by rate for their career in 2003-2006.

Right now I’m working on assigning ground ball hits to the outfield to the various infielders, and then will do baserunning allowed. This will be the basis for my play by play based fielding that I will use to evaluate GameDay data, especially for minor leaguers. Then all I need is some catchy acronym.

 

Advertisements

37 Responses to Defending Manny's Defense

  1. Pizza Cutter says:

    OPA! actually does use something of a ballpark adjustment much along the same idea that you used. On balls which RS codes flyball to left fielder, I take a look at what percentage the visiting teams managed to track down in each ballpark and used that as the basis for my league expectation.
    My question is why HR are included in your FRAA numbers. I’m sure someone will reference Canseco’s headshot homerun, but usually fielders just stand and watch home runs and can’t really do much about them.
    Ryan Braun is also confusing me. I had him as the best left fielder in baseball (you have him well below average), including the best at tracking down flyballs. Milwaukee did give up 175 HR, good for 8th most in MLB, and as a team had the 3rd highest HR/FB rate in the majors. OPA! doesn’t count HR. Is that the possible difference?
    OPA! had Manny just slightly below average in the field for 2008 (he made up for it by being slightly above average with the arm). In 2007, he had an awful year, finishing 13 runs below average, 6th worst in baseball ahead of Chris Duncan, Raul Ibanez, Raul Mondesi, Carlos Lee, and… Jason Bay.

  2. Peter Jensen says:

    As Pizza says you definitely shouldn’t be including HRs unless you are limiting them to inside the park HRs which you can do by querying HRs that have a “Fielded By” fielder. You also should be including line drives to left field somewhere, either as a separate category or combined with fly balls. If you set up your database so that it shows the before and after base out states for each play you can use BRAA instead of linear weights and show how much an outfielder limits the advance of other runners on base as well as the batter’s advancement. I have had a fielding metric using Retrosheet data in this manner for several years. There are some problems that limit its usefulness for all fielder’s but it does give another perspective to a fielder’s possible capabilities. I have also argued in various threads that UZR and Dewan have not properly accounted for Fenway’s left field park effects and therefore have not given Manny enough credit for his fielding abilities. When he went to LA his fielding was very close to average for his short time there.

  3. Brian Cartwright says:

    Gentleman, they are inside the park HRs.
    HRs over the fence are given a “0” code for FLD_CD, a total fo 78 over the past 6 seasons. (although there is one HR on a LD to SS.
    I did notice the discrepancy on Braun. Because I compare the fielders to everyone else who has played the same position in the same ballparks, someone like Braun who is in their first year can be skewed by the player they replaced.
    Braun was -18.1 hits in Miller Field, -7.7 hits on the road. Geoff Jenkins was +13.3 runs in LF, -1.3 runs in RF over the 6 years – good, but not great in LF.

  4. Peter Jensen says:

    Brian – In calculating what an average fielder does at a position in a particular field you should not include all the home fielder’s stats to calculate an average value. If I understand Pizza correctly, he ignores all the home fielder’s stats in calculating an average value. If I understand you correctly you include ALL of the home fielder’s stats. I use all of the visiting player’s stats and a prorated amount of the home fielder’s stats; I think 1/12 was the number I settled on. Doing this could lower the average value for LF’s at Miller Field if the previous MIL LF’s had all played above average.
    The other major problem that you still have that you have not mentioned is divying up hit ball opportunities between a fielder and his adjacent fielder, in this case CF. If Manny lacks range towards center, balls that a normal LF might field, either for outs or hits, may instead show up as balls ultimately fielded by the center fielder. I think this is the main difference between whole field metrics including yours and zone type metrics.

  5. Brian Cartwright says:

    All good points. This is my first attempt, I’m looking for something to use on minor league GameDay, which might not be perfect, but at least good.
    In the OF, RetroSheet and GameDay label who retrieved all the batted balls. For now, I am content with that.
    In the infield (I am coding now), I am assigning the split zones controlled by Bathand, with the ratios determined by the infield grounders hit to the fielders on either side of the gap. For example, with a LHB, 60% of hits to RF are assigned to 2b, 40% to 1b, but with RHB, 79% to 2b, 21% to 1b.

  6. Brian Cartwright says:

    Pizza – I had a thought – what is the time span you used for your baseline, the others to whom each player is compared? I used a maximum of all six years 2003-2008, per ballpark version. If you only use one season at a time, it may be more subject to random variations.

  7. Peter Jensen says:

    Brian – I thought you were going to use the Gameday Hit Location information to fine tune the hit ball opportunities? It is the relative quality of adjacent fielders that can cause an erroneous number of hit ball opportunities for both players in both the infield and the outfield. If you want your fielding metric to be useful as anything more than a rough estimate of a fielder’s ability you must make a better attempt to account for talent descrepancies. Relying on who fields the ball in the outfield or a fixed percentage of infield ground balls can lead to +- errors of 5 to 10 a season. If you want to correspond with me privately I will help you where I can.

  8. Brian Cartwright says:

    Peter, you are right on GameDay’s hit locations. I do intend to use the best data available. Before I read your last comment, I went to mlb.com’s site to double check that the hit locations are used all the way down to Rookie level.
    Even with the vectors, there will still be some guessing in the infield split zones, but it shouldn’t be as rough.

  9. Pizza Cutter says:

    Brian, my baseline is just the year in question. Random variations may indeed be peeking their heads in. I’m not sure how consistent those numbers are across the years.
    I’m also happy to share my “blame charts” for ground balls that get through the middle.
    Peter, you understand my method correctly. I throw out all of the home fielders. The problem with looking at them is that they take up too much of the sample. A particularly good or bad one can skew the sample.

  10. Colin Wyers says:

    Brian, I have a zone-based system using Gameday hit-location data that I’m working on. If you’re interested, I’ll send you the code I have so far.
    As for home fielders, I think they should be included, just weighted. Essentially, you “cap” the number of innings played out to, say, 15th of the total, and prorate out the stats to that.

  11. Peter Jensen says:

    Colin – How have you handled the different home plate locations and distance multipliers for each field prior to 2008 and the changes that occurred during the 2008 off season?

  12. Brian Cartwright says:

    Peter, I was not aware of changes in this post-season (assuming you mean after 2008).
    A while ago Colin had sent me a file with the average location of each gb in the infield. If I remember correctly, Colorado and Baltimore ahd different home plate locations, but I do not recall detecting any problem with the distance multipliers.
    Parsing GameDay is something I am very interested in, and would appreciate your sharing any insights or info on this.

  13. Peter Jensen says:

    Brian – No the changes were made after the 2007 season. I have not yet tried to rectify the 2008 season as I just loaded the hit locations into my Retrosheet file this week. Supposedly, MLB was trying to make changes for 2008 to make the fields compatible to one another. There were definitely different multipliers for different fields prior to 2008. MLB wasn’t (isn’t?) using the hit location data for data analysis so they don’t keep track of these things. They just use the hit location data for graphical presentations on Gameday and in the hit location charts. Rectifying the information to make it useful for fielding analysis is pretty much informed guesswork for MLB fields. Rectifying for Minor League fields is a much bigger problem. That’s why I wanted to compare approaches with Colin. I don’t think it is possible to get the errors down to the level of being able to use the information for a zone fielding system. I do think the information may be useful for improving total zone systems, at least at the major league level.

  14. Colin Wyers says:

    I’ve punted on most of those issues so far, Peter. I’m only using 2008 data at the moment.
    Trying to fit MLB Gameday x,y coordinates from Dodger Stadium to an actual diagram of Dodger Stadium has been difficult at best, so I’m sure there are some data quality issues here. (I took my plot and superimposed the Gameday plot over it, and they didn’t line up well at all.)
    I don’t have a proposed solution, which is the largest reason I haven’t actually published the system yet.

  15. Brian Cartwright says:

    Colin, remember that the GameDay coordinates are for Java, so they have a topleft origin

  16. Colin Wyers says:

    I use negative Y values in my plotter, which should compensate for that, Brian. Orientation isn’t the problem – for lack of a better term, the plots I make seem to be wider than the Gameday plots. That’s not the only discrepancy – I’ll try and post some graphs later tonight.

  17. Peter Jensen says:

    Colin – I am not sure that I know what you mean by wider. The in play areas of the field both are 90 degree angles aren’t they? Do you mean that the outfield angles seem wrong? Or are you talking about the out of play areas?

  18. Peter Jensen says:

    Colin – I am not sure that I know what you mean by wider. The in play areas of the field both are 90 degree angles aren’t they? Do you mean that the outfield angles seem wrong? Or are you talking about the out of play areas?

  19. Colin Wyers says:

    I’d have to check to be sure, and given the data at hand I couldn’t be 100% confident I was getting the right results – there’s no data to denote the official foul ground of each park in x,y coordinates, so I’d have to estimate – but it seems to me that the fair area of the field IS greater than 90 degrees when I plot the x,y coordinates provided for Dodgers Stadium.
    Unfortunately, I can’t find the graphs I made that I based this conclusion off of, so I’ll have to dig into the original CSV files and see if I can’t duplicate them later.

  20. Peter Jensen says:

    Colin – Are you superimposing the hit ball locations (MLB x and y) over your LA stadium plot or are you superimposing the MLB hit location stadium diagram over your LA stadium plot?

  21. DanAgonistes says:

    Keep in mind that for the minor leagues the x and y GameDay coordinates are only entered by observation for AAA in 07 and 08 and the Texas League (AA) for 2008. For everything else it is entered offsite and so won’t be usable.

  22. Peter Jensen says:

    Dan – What do you mean by entered offsite? Entered by whom and from what data source?

  23. DanAgonistes says:

    Stringers at the game call in the game events and the data entry is done elsewhere.

  24. Colin Wyers says:

    Peter, I don’t remember which I did last time. I really don’t know what difference it would make either way – I have both plots, so let me know what presentation you’d prefer.
    Using Photoshop, I measured my best esitmate of the fair-foul line on the Dodger Stadium plot and came up with 93 degrees. On the official Gameday plot, I get a perfect 90 degrees.
    I’ve got some other measurements but I have to run, so I’ll hopefully get them up later.

  25. Peter Jensen says:

    Dan – The 2006 and 2007 AA files seem to have a full range of x and y values. Are the stringers reporting them or are the recorders guessing at them?

  26. DanAgonistes says:

    Closer to guessing since they’re the ones doing the data entry based on only verbal communication.

  27. Colin Wyers says:

    Okay, well I’ve licked one potential problem – the coordinates I’m using for home plate (which I took from Baseball Hacks) are simply wrong. If anyone can give me a better set of coordinates for the infield diamond that would be greatly appreciated.
    Once I figured that out I was able to figure out my scaling problem. Based on this one game, to fit the Gameday plot onto my plot I had to use a ratio of 87.8% in the y axis to 100% on the x axis.

  28. Colin Wyers says:

    And again, I’m finding fault on my end – the plotter, not Gameday, is the source of the distortion. Ugh.

  29. Brian Cartwright says:

    Colin, from the data you sent me a while back, I calculated x=120 y=200 for home plate. I have seen 210 reported for HP, and that could be true, as I used the mean xy of fielded balls by that position, which for C could be 10 pixels in front of HP. That would give 240 pixels left to right, probably the same (210+30) top to bottom.
    from what Dan says, it looks like for below AAA, BATTEDBALL_CD and FLD_CD (after parsing from GameDay) are about the best we are going to get for now, and the code I developed for this article is based on that, and which I believe is pretty much what Dan’s SFR is based on.
    Dan, I was looking to contact you, but the only email I could find was your BP address on your website, and I wasn’t sure you still checked that. I would like to take this opportunity to convey a very heartfelt thank you for your recent recommendation of me, and I hope we can stay in touch in the future.

  30. Peter Jensen says:

    Brian – The X Y data is recorded directly on a representation of the field that is identical to the representation of the hitting charts in the MLB stats section. The fields are drawn on a 250 by 250 grid. For 2008 home plate is pretty closely aligned to the center of the X axis (x coordinate 125) for almost all fields. There seems to be some field to field variation for the Y coordinate of home plate around an average of 197 to 200, but not nearly as much as there was in prior years. I have not yet tried to figure the distance multiplier as you need to have the home plate position pretty well located for each field before adjusting the multiplier for each field. Minor league fields are a total mystery.

  31. DanAgonistes says:

    Brian, no problem. My email address is linked to this reply but my BP address is forwarded as well.

  32. DanAgonistes says:

    Oops, I see it prepends the “http”. The address is dan.fox@pirates.com.

  33. Colin Wyers says:

    For 2008, using Photoshop’s ruler tool and some quick math, I get 130,-210 for home plate at Dodger Stadium.

  34. Peter Jensen says:

    Colin – A home plate Y of 210 for LAN would have the pitcher fielding more than 60% of his ground ball outs at a greater distance from home plate than the pitching rubber.

  35. Mike Fast says:

    I found home plate to be within a few pixels of x=125.5, y=203. The data seem to bear that out (most of the time). In addition, almost all of the park diagrams in use for Gameday have that pixel location for home plate or something with y between 200 and 206.
    I know there are some errors in the data, such as foul balls that are labeled as being fielded in the middle of the diamond, but basically I found most of the data to fit on a decent coordinate system (90 degrees of fair territory plus/minus one degree or so, home runs at proportional distances). I looked for park/time changes, and I couldn’t identify anything consistent. I did notice that Oakland and Baltimore seemed to have more errors than some other parks, but I couldn’t identify anything like a specific time frame for a coordinate shift.
    I’ve spent most of my time looking at bunts and fly balls, so if there’s something screwy in the ground ball, line drive, or popup data, I may not have seen it.
    So far I’ve only analyzed 2007 and 2008 data, although I have 2005 and 2006 sitting waiting to be parsed into my database. Did anyone happen to get the 2004 data before it vanished from the Gameday site?

  36. Peter Jensen says:

    Mike – I do not have the 2004 data. The old hitting chart field diagrams (pre 2008) are no longer available, but some of them were much different than the 2008 field diagrams. The pre 2008 Baltimore data was the worst. The 2008 data is much more consistent, but still needs to be adjusted for each field.

  37. joe arthur says:

    FWIW, I don’t remember hit location data being part of the 2004 files at all – the format was very different, not xml, and the data model was not as extensive – many fewer files. I never took copies of them.
    I made some casual attempts to look at the coordinates a year and two years ago; I thought the x-coordinate for home plate was consistent (literally centered on pixel 125.5, as Mike reports). The y-coordinate varied from park to park, based on balls fielded very near the plate which I reviewed. As Peter has pointed out, the coordinates feed the gameday hit charts generated out of the park diagram and in past years those diagrams varied where they put home plate along the y-axis. I suppose it is possible that there are sometimes discrepancies or misalignments between diagrams used for recording the data and the diagrams used by gameday for displaying it.
    But another concern is (or was) whether the diagrams were truly to scale or distorted. Colin in #27 talks about a scaling factor for the y-dimension, but it seemed clear to me that the infield and outfield were not on a consistent scale for any of the parks I looked at [the # of feet per pixel to cover the known distance home-2nd not agreeing with the # of feet per pixel to cover the “known” distance to the fence in straightaway center.]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: