Off Season To-Do List
November 27, 2008 1 Comment
Yes, here in the U.S., it’s Thanksgiving Day, but you don’t have to live here to give thanks!
Aftering going over many hills and through the woods, eating large quantities of turkey and all the trimmings at my mother-in-law’s house and then sleeping it off, it’s time to talk about my off-season sabermetric to-do list.
Finally I finished programming everything into my batting projections, and published the results last week. However, in order to do a comprhensive eveluation of a player, in addition to batting we also need baserunning, defense and pitching, in the end all expressed in runs, so that they can be summed into a number representing the total contribution.It’s one thing to be able to show how any hitter projects, but without knowledge of speed, arm and defense, it’s hard to make a final judgement.
In yesterday’s Roundtable, we were asked for our World Baseball Classic starting lineups for the U.S. Derek Jeter and Michael Young are the two best hitters at shortstop, but both are among the worst defensively. Jimmy Rollins is good but not as good with the bat, but has the good defense to be most people’s overall choice as the best U.S. born shortstop. Another example is in the Pirates’ Roster. Brandon Moss plays rf, lf and 1b, and the past four seasons his translated wOBAs have been .339, .335, .334 and .331. Andrew McCutchen plays cf. His wOBAs the past three years have been .342, .322 and .323. Moss looks to have a slight edge in batting productivity, but compared to corner outfielders (.347) and firstbasemen (.357) he’s way below average, while McCutchen is only slightly below all centerfielders (.330). Add in that BP’s baserunning stats show Moss as dreadfully slow whiel McCutchen has a reputation for being very fast, and that Moss is regarded as a poor fielder to McCutchen’s good, and you might conclude that McCutchen should be in cf, McLouth in lf, and Moss in AAA.
The first question usually asked about pitching analysis is if it will be DIPS compliant. Yes and no. The problem with DIPS is that it has an all or nothing approach. Pitchers get no credit for the number of base hits allowed, and full credit for everything else. My pitching projections will be very similar to the batting, and each component will have it’s own regression factor. I need to do work on determining the exact values to be used, but BABIP is about 20% pitcher and 80% defense. Therefor, it will be regressed much heavier than homerun, walk and strikeout rates. One problem I will have with pitching is that the available minor league statistics don’t cover all the categories – missing things like batters faced, intentional walks and hit batsmen for many or all seasons.
My fielding and baserunning will need play by play. Just today RetroSheet released the 2008 dataset. My formulas will be very similar to what Colin, Pizza Cutter and Dan Fox have done, but I want to also use them on minor league data. GameDay has play by play available for all minor games starting on 2006, which will also solve the missing pitching categories.
Before any of that data can be used it needs a database to hold it. Right now I can do Retro and major league pfx centric processing. I am working, on and off and now back on, on a database design that will hold Baseball DataBank, KJOK, RetroSheet and pitch f/x data, and be able to have daily automatic updates from GameDay of both major and minor league games. After the database is constructed, scripts have to be modified to download and parse the all of the GameDay files, inserting the values into the database.