The 2008 OPA! Gold (and Lead) Gloves

That running sound of people stampeding toward something isn’t Black Friday shoppers.  (Did you know that Christmas and Chanukah used to be religious holidays?)  It’s all my Sabermetric brethren running to the servers of Retrosheet (Can I take a moment and gush about Retrosheet and how cool they are?) for the newly posted 2008 play-by-play data file.  Last night, I told Mrs. Cutter, “I just got my early Christmas present.”  She just nodded and smiled.

With the release of the new file, I can run the 2008 season through some of my favorite syntax programs.  First on the list is my defensive rating system, OPA! (out probability added above average).  By way of a short introduction, OPA! breaks fielding into its component parts (range, fielding, arm, and catching the throw on ground balls, for example) and eventually sums it all up in a neat little run value.

First, let’s award some Gold (and Lead) Gloves.  These go to the players who saved (or bungled away) the most runs at their position during the 2008 season.  A player who spent more time at the position will have more of a chance to rack up more runs saved (or bungled), but we’ll adjust for that later.  The reason that there’s no catcher award is that I can’t really measure catcher defense with OPA!  Runs saved (or bungled) are in parentheses.


Position OPA! Gold Glove (AL) OPA! Gold Glove (NL) Real Gold Glove (AL) Real Gold Glove (NL) OPA! Lead Glove (AL) OPA! Lead Glove (NL)
Pitcher John Garland (2.54) Kyle Lohse (2.76) Mike Mussina (1.06) Greg Maddux (1.89) Fausto Carmona (-2.56) Brandon Webb (-5.13)
First Baseman Mark Teixeira* (13.93) Albert Pujols (16.32) Carlos Pena (2.90) Adrian Gonzalez (13.28) Richie Sexson (-14.11) Mike Jacobs (-18.92)
Second Baseman Mark Ellis (17.88) Chase Utley (11.93) Dustin Pedroia (3.20) Brandon Phillps (-0.71) Brian Roberts (-11.94) Richie Weeks (-12.38)
Third Baseman Scott Rolen (14.58) Chipper Jones (9.89) Adrian Beltre (9.97) David Wright (-9.00) Mark Reynolds (-10.43) Edwin Encarnacion (-16.45)
Shortstop Mike Aviles (8.58) J.J. Hardy (13.03) Michael Young (1.08) Jimmy Rollins (-3.77) Edgar Renteria (-11.68) Stephen Drew (-13.25)
Outfield Jacoby Ellsbury (25.83) Ryan Braun!!! (23.67) Ichiro Suzuki (6.38) Carlos Beltran (11.89) Jason Bay* (-35.79) Brad Hawpe (-47.75)
Outfield Franklin Gutierrez (24.96) Randy Winn (23.11) Grandy Sizemore (-3.50) Nate McLouth (-11.60) Jermaine Dye (-25.62) Adam Dunn (-20.43)
Outfield Denard Span (16.31) Cody Ross (19.50) Torii Hunter (-21.02) Shane Victorino (4.16) Torii Hunter (-21.02) Gregor Blanco (-16.34)


A few notes.  I know that Bay and Teixeira both started out in the NL and made their way to the AL in 2008.  Oddly enough, the next logical candidate for the AL OPA! Gold Glove (someone who did well with the glove and spent time in the AL) was Casey Kotchman, for whom Teixeira was traded.  If Bay is removed from consideration altogether for his lead glove because of his league-hopping, then Nick Swisher (-17.90) would assume his mantle.  (Perhaps that’s the wrong word given that we’re talking about futility in outfield defense?)  If you want a purely AL first baseman to give the OPA! Gold Glove to, then it would go to Daric Barton. 

An aside: that gives Oakland the best first baseman in the AL with Barton, Ellis and his Gold Glove at second, and Jack Hannahan was actually the 2nd best AL (and MLB) third baseman behind Rolen.  Bobby Crosby, their SS, was 3rd best in the AL.  There’s hidden value in defense, eh Billy?

Then, there’s the outfield issue.  Like the actual Gold Glove voters, I considered “outfielder” to be a generic term.  In fact, most of the winners of the OPA! awards spent time at two or even all three of the outfield positions (Ellsbury and Gutierrez played all three.)  If you want to look at each outfield position seperately, the winners of the Gold and Lead Gloves (AL/NL) in each league were:

Gold Gloves
LF – Johnny Damon/ Ryan Braun 
CF – Carlos Gomez/ Cody Ross
RF – Franklin Gutierrez / Jason Werth

Lead Gloves
LF – Jason Bay*/ Adam Dunn
CF – Nick Swisher/ Lastings Milledge
RF – Jermaine Dye/Brad Hawpe

The depressing thing is that the BBWAA voters didn’t pick any of the “correct” choices at any of the positions.  Adrian Gonzalez at first base was about as good as they got.  And they lazily selected Torii Hunter again (all three AL OF winners were repeats), despite the fact that he won a Lead Glove.

Ryan Braun, after “winning” a Lead Glove last year at third base, won a Gold Glove in left field.  This speaks to how poor left fielders are at fielding in general and also the fact that outfield defense is really prone to a lot of variation.  In other words, just about anyone can have a good year in the outfield.

Chase Utley.  What the heck does that guy gotta do to get noticed?

More 2008 stuff coming soon…

Off Season To-Do List

Yes, here in the U.S., it’s Thanksgiving Day, but you don’t have to live here to give thanks!

Aftering going over many hills and through the woods, eating large quantities of turkey and all the trimmings at my mother-in-law’s house and then sleeping it off, it’s time to talk about my off-season sabermetric to-do list.

Finally I finished programming everything into my batting projections, and published the results last week. However, in order to do a comprhensive eveluation of a player, in addition to batting we also need baserunning, defense and pitching, in the end all expressed in runs, so that they can be summed into a number representing the total contribution.It’s one thing to be able to show how any hitter projects, but without knowledge of speed, arm and defense, it’s hard to make a final judgement.

In yesterday’s Roundtable, we were asked for our World Baseball Classic starting lineups for the U.S. Derek Jeter and Michael Young are the two best hitters at shortstop, but both are among the worst defensively. Jimmy Rollins is good but not as good with the bat, but has the good defense to be most people’s overall choice as the best U.S. born shortstop. Another example is in the Pirates’ Roster. Brandon Moss plays rf, lf and 1b, and the past four seasons his translated wOBAs have been .339, .335, .334 and .331. Andrew McCutchen plays cf. His wOBAs the past three years have been .342, .322 and .323. Moss looks to have a slight edge in batting productivity, but compared to corner outfielders (.347) and firstbasemen (.357) he’s way below average, while McCutchen is only slightly below all centerfielders (.330). Add in that BP’s baserunning stats show Moss as dreadfully slow whiel McCutchen has a reputation for being very fast, and that Moss is regarded as a poor fielder to McCutchen’s good, and you might conclude that McCutchen should be in cf, McLouth in lf, and Moss in AAA.

The first question usually asked about pitching analysis is if it will be DIPS compliant. Yes and no. The problem with DIPS is that it has an all or nothing approach. Pitchers get no credit for the number of base hits allowed, and full credit for everything else. My pitching projections will be very similar to the batting, and each component will have it’s own regression factor. I need to do work on determining the exact values to be used, but BABIP is about 20% pitcher and 80% defense. Therefor, it will be regressed much heavier than homerun, walk and strikeout rates. One problem I will have with pitching is that the available minor league statistics don’t cover all the categories – missing things like batters faced, intentional walks and hit batsmen for many or all seasons.

My fielding and baserunning will need play by play. Just today RetroSheet released the 2008 dataset. My formulas will be very similar to what Colin, Pizza Cutter and Dan Fox have done, but I want to also use them on minor league data. GameDay has play by play available for all minor games starting on 2006, which will also solve the missing pitching categories.

Before any of that data can be used it needs a database to hold it. Right now I can do Retro and major league pfx centric processing. I am working, on and off and now back on, on a database design that will hold Baseball DataBank, KJOK, RetroSheet and pitch f/x data, and be able to have daily automatic updates from GameDay of both major and minor league games. After the database is constructed, scripts have to be modified to download and parse the all of the GameDay files, inserting the values into the database.

What am I thankful for?

The 2008 Retrosheet event files.
It’s probably going to take me until the weekend to get everything set – you know, I’m sure, how hectic the Thanksgiving part of the week can be. But after that, I hope to be slicin’ and dicin’ the 2008 data for some super fantastic fun-time statsapalooza!
[Super fantastic fun-time statsapalooza only available in participating blogs. Some restrictions apply. See your blog for complete details. Offer not valid where prohibited.]

World Famous StatSpeak Roundtable: November 26

Today, the roundtable shares some turkey with the usual StatSpeakers and Victor Wang, of The Hardball TImesVictor is the recipient of SABR’s Jack Kavanaugh Memorial Award given each year for the best baseball research by someone who isn’t old enough to vote.  Victor chimes in on his thoughts about Chase Utley, the Seattle Mariners embracing the dark side, the World Baseball Classic, the world in general, and women!

And since it’s Thanksgiving (in the U.S.), on behalf of Colin, Brian, and Eric, we are thankful to all of you out there who read StatSpeak.  The two most powerful words in the English language are “Thank you.”  We hope that your Thanksgiving Day is filled with those two words in abundance.

Question #1: The Seattle Mariners recently announced that they would be implementing an entire department for statistical analysis.  Given that Jack Zduriencik is considered as more of an “old school scout” type guy, how much influence do you think the statistical department will actually have with the Mariners?  Is this a sign we can finally end the silly stats vs scouts debate?

Victor Wang: Honestly, I am not sure how much Jack Zduriencik is going to use his new statistical department. I do think that this will be one of the more interesting storylines of the off season and definitely something to keep an eye on in the future. I think this is a great sign for Mariners fans that their new GM is willing to keep an open mind and integrate statistical analysis and a sign that people are understanding that the best front office combination involves both scouting and stats. It might take a while but it definitely looks like the Mariners rebuilding process is off to a good start. The AL West looks like it’s going to be very competitive in the future, with Oakland and Texas having arguably the top two farm systems in all of baseball and now Seattle getting its act together.

Brian Cartwright: If I was running they’re department, first I’d set up the relational database that has the names, vitals, and baseball stats for every professional and notable amateur player on the planet, along with all the queries I could think of. I’d want the boss to be able to sit down and ask of it “Show me all the good fielding secondbasemen with power”. Step two, attach all the traditional scouting reports to that player as well. Scouts use numbers for different categories, put those into tables and weight them like Marcels. Do the system so that the boss can access all the scouting data that he’s familiar with, plus all the stat data in a friendly and convenient form. Stats and scouts each have their place, find a good way to blend them.

Colin Wyers: There doesn’t need to be a tension between “stats” and scouts, and I don’t think there’s much of one anymore in the professional baseball community.

It’s important to note what modern baseball analysis does, at least when it’s done properly – it takes and systematically analyzes populations of players in order to find essential truths about baseball. For a variety of reasons (most having to do with accessibility) that’s been done with the official statistics and box scores of the game. But it doesn’t have to be. A “sabermetric” analyst working for a team could do an awful lot with raw scouting reports, and I like to think that real teams do this.

I think the real tension is between analysts and narrativists. Somebody like Murray Chass or (sadly) Tom Boswell isn’t arguing from a tools perspective. The argument isn’t even over statistics, per se – VORP and RBI are both, last I checked, numbers. Baseball teams use advanced metrics and scouting, to varying degrees; that war is over, at least in the broader strokes. (At least I think so – I don’t get invited to a lot of MLB front offices, so I don’t really know first-hand.)

Eric Seidman: If Jack made it a point as much as he did to discuss this part of his evaluative team, they aren’t going to just sit there.  I have no idea to what extent the Mariners fused scouting and stats during the Bavasi era or even the Gillick era, but probably not as much as they will now.  The stats vs. scouts will never die, because old-time writers need something to complain about.

Pizza Cutter: Are they hiring?  It would seem foolish that Zduriencik would establish a whole department and then not use it, so they’ll have to have some influence.  Given that the team has shelled out a lot of money on some really awful contracts (Richie Sexson, Adrian Beltre, Carlos Silva), they’d do well to listen.  And the stats vs. scouts thing will never die.  Statisticians who give the human mind credit for how much information it can process are rare.  Scouts who understand how easily fooled the human mind is by all that information are rare as well.

Question #2: Japan’s pro league recently drafted a female pitcher in their amateur draft. How long until MLB does the same?

Victor Wang: I think it’ll be a long time before a female pitcher is going to get drafted because she throws a mid 90s fastball. I think if it is to occur in the near future it will need to be someone, like the Japanese pitcher, who does something like throw a knuckleball or throws submarine. If someone decides they want to do something like that then maybe there’s a shot that it occurs in 5-10 years. However, with softball being an option for girls, I just don’t see something like this happening anytime soon. I would love to be proven wrong though.

Brian Cartwright: 15 or 20 years ago there were a handful of women who got to play winter ball, the old Hawaiian league I believe. If he Pirates can sign two pitchers who have never played baseball, there should be some women out there who qualify. It only takes one daring GM to do it here. But, after looking at 2000 some batters projections for this year you realize how very few players in the minors are good enough to play in the majors. I’m sure there are probably many women who could do OK in the low minors, but it might take a long time to find one who could succeed at the higher levels.

Colin Wyers: I suspect if you had asked the Japanese equivalent of StatSpeak this question a few weeks ago (do they exist?), they would have shrugged and gone, “Someday? Maybe?” It’s the sort of thing that sneaks up on you. The thing to keep an eye on is the adoption of amateur women’s baseball (not softball) players.

Eric Seidman: Never.  I hate how short and terse that sounds, but this will not happen in my lifetime.

Pizza Cutter:  Let’s see.  If I have a daughter, 18 years.  There have been girls who have played in the Little League World Series, and eventually, one of them will get a college scholarship, and maybe just maybe get drafted.  It’s hard to put a time frame on it, and I’m not one to say that it’s very likely given that girls are pushed to softball, but as a function of pure probability, it’s bound to happen at some point. 

Question #3: This week the Pirates signed two pitchers from India and a shortstop from South Africa to minor league contracts. Is it worthwhile yet to scout the “developing” baseball world? 

Victor Wang: While you never want to close yourself off from a pool of talent, I’m not sure if it’s yet worthwhile to scout the developing baseball world. It seems like it might take some time for legitimate talent to be produced. Using the NBA as an example, it took a while for Europe to start consistently developing basketball once basketball became popular over there. And I doubt that baseball will ever become as popular as basketball overseas.

I think the biggest benefit right now in scouting developing worlds is the relationships that could be built. If a team could sign say a baseball version of Yao Ming for cheap, the payoff would be tremendous, especially with prices for prospects in the Dominican and Venezuela skyrocketing.

Brian Cartwright: Like my previous answer, sure we can find guys who can play in the minors, but I think it takes an established baseball tradition, playing the game everyday since you were 10, to refine the baseball instincts. Taiwan and Korea have a lot of people, and they have played baseball for a long time. In 1983 I did statistics for the World Friendship Games, in which both competed and did well. There are a surprising amount of Taiwanese in the minors today. But, as seen in the WBC, teams like Holland, Italy and South Africa struggle to be competitive, let alone India. It probably is worth the risk to sign a guy who looks good, our minors are already filled with guys not going anywhere. In doing so now you can establish contacts that 10 or 20 years from now might be vital when a true prospect comes along.

Colin Wyers: There is a risk involved. It’s very possible that somewhere like India simply isn’t able to sustain the development of baseball players at a high enough rate to be worth the risk.

But at the same time, if it does pan out, you have a foothold in a talent market before everyone else, getting exclusivity at first and a lot of benefits thereafter. We see right now an unequal amout of access to the talent markets in Latin America and Asia – if there is baseball talent to be had in India, the Pirates could be really helping themselves out here.

Eric Seidman: It is bad to be closed-minded, especially when it comes to scouting.  The two Indian pitchers were on a reality show, and we don’t know if they will pan out, but this world is vast and there is bound to be talent all over the place.  Pat Gillick helped create the Blue Jays teams of the 1970s-80s by launching scouting probes into Latin America, which, back then, probably seemed as odd as scouting into India and South Africa does today.  These players might not pan out, but it never hurts to try.

Pizza Cutter: It would be silly to avoid an untapped pool of talent.  The trick is that these guys will be the rawest of the raw.  They may be physically talented, but likely don’t have “tools” yet, which come from actually playing the game and honing those skills.  Places liek India and South Africa don’t have a huge baseball program.  (The two pitchers from India had never held a baseball until earlier this year.)  But then, India and South Africa are both British colonies with a cricket-playing history and that’s sorta the same thing.  My guess is that slowly, kinda like it was with Venezuela and now it is with Japan, we will start to see more and more of these gentlemen get the call to MLB from those countries.

Question #4: The Phillies took a big blow this week when they heard that Chase Utley could miss the first 2 months of the 2009 season.  They recently optioned Tad Iguchi to AAA, who filled in admirably when Utley went down in 2007.  They also have Eric Bruntlett and could promote prospect Jason Donald.  Would this three-headed monster be sufficient for the 30-40 games Utley could miss, or should the Phillies pursue a better stopgap?

Victor Wang: The length of Utley’s injury will obviously be a crucial factor in the Phillies’ decision. If he only misses 30-40 games I think the Phillies will be fine with Donald or Iguchi. The Phillies could also use this time to “showcase” Donald to other teams as he is blocked off at 2B and SS for the long term and probably won’t have enough bat to stick at 3B. Unless Utley is out for a much longer period, I don’t think the difference between Donald and say a guy like Jeff Kent would be that great to worry about. I would say they should focus more on getting a LF and another starter before investing resources in a stopgap 2B.

Brian Cartwright: Sorry Eric, but this could be ugly without Utley. Iguchi offensively is league average at 2b, but Bruntlett is brutal. Then again, the Phils won the Series even with Utley having little power after the beginning of June. Iguchi would be an acceptable place holder unil Utley is healthy.

Colin Wyers: They can go ahead and kick the tires on the handful of utility players on the free agent market – Craig Counsell would be a decent fit and would be a guy that understands that he’ll eventually be returning to the bench. David Eckstein could be another guy in that mold, depending on how he sees himself at this point. Ray Durham might work. They’re going to have to wait until a lot closer to spring training, though – at this point nobody wants to sign on to caddy for Utley when there’s still a posibility of going to a team where they have the chance to start.

Eric Seidman: Iguchi and Jason Donald should be fine in terms of being slightly above replacement level while Utley is out.  For all we know, he may only miss 10 games, so it is tough to determine the next course of action until more is known about his expected recovery time.  If complications arise and 2 months turns into 5 months, then a guy like Ray Durham would be solid to have, but I cannot see him missing more than one month, and they have been fine when he has missed time like that before.

Pizza Cutter: Maybe absence will make the heart grow fonder and people will actually realize what a valuable player Chase Utley is.  Iguchi and Bruntlett are, at best, replacement level players.  But then again, that’s the whole point behind the concept of “replacement level.”  They’re spare parts, and the stuff that’s available for free out there is going to be just as rank.  Basically you’re talking about signing a utility infielder to replace the Utley infielder for a few games, and then have him morph back into a utility role when Utley comes back.  They’re not going to trade for an actual 2B as a stopgap, because he’d then have to ride the bench and you’ve traded something away for something else.  This is a bad situation.  I say play Iguchi and hope.  It’s probably about the best you can get at this point.

Question #5: The World Baseball Classic is coming up.  Which nine players should take the field for the USA in the first game of the tournament?

Victor Wang:

Joe Mauer, C

Grady Sizemore, CF

Mark Teixeira, 1B

Alex Rodriguez, 3B

Josh Hamilton, RF

Matt Holliday, LF

Dustin Pedroia, 2B

Jimmy Rollins, SS

C.C. Sabathia, P

Mauer may not be a prototypical lead off hitter, but he would have some of the best on base skills on the team. The toughest choices for me were between Lance Berkman and Mark Teixeira and Ian Kinsler and Dustin Pedroia. I gave the edge to Teixeira because of superior defense and Pedroia wins the tiebreaker with his scrappiness.

Brian Cartwright:

c Matt Wieters

1b Mark Teixeira

2b Chase Utley Dustin Pedroia

ss Derek Jeter

3b David Wright

lf Matt Holiday

cf Grady Sizemore

rf Josh Hamilton

dh Milton Bradley

sp Tim Lincecum

Colin Wyers: Off the top of my head (or as close to as possible – I had to look up a few things, mostly do to with citizenship):

C Joe Mauer

1B Mark Teixeira

2B Dustin Pedroia

SS Jimmy Rollins

3B Alex Rodriguez

LF Matt Holliday

CF Grady Sizemore

RF J.D. Drew

I think you can win a few baseball games with that team.

Eric Seidman: Mauer, Tex, Kinsler, Rollins, A-Rod, Giles, Holliday, Sizemore, Adam Eaton on the mound.

Pizza Cutter:

c – J. Mauer
1b – M. Teixeira
2b – B. Roberts
3b – A. Rodriguez
ss – J. Rollins
lf – M. Holliday
cf – G. Sizemore
rf – J. Hamilton

sp – C. Sabathia

Catching, the US has Joe Mauer to call on, even over my personal favorite Brian McCann.  At first, it’s a little bit more crowded with Berkman, Teixeira, Howard, Fielder, and even a guy like Kevin Youkilis.  Tex is probably the most complete player of the bunch, although I’d take a guy like Youk along for the ride.  Now that Chase Utley is hurt, I guess second will fall to Dustin Pedroia, but I’d personally give the call to the ever-underappreciated Brian Roberts.  Short should go to Jimmy Rollins (although I suspect that Jeter will get it), and third will go to A-Rod (if he wants to play… if not, David Wright will do nicely).  I’d personally like to see an outfield of Sizemore, Holliday, and… wow, there really aren’t a lot of outstanding American outfielders.  Josh Hamilton (one year?)  Adam Dunn (StatSpeak drinking game players, take a shot!)  Brad Hawpe?  On the mound, if you look at the leaders from last year in FIP among starters, the top eight are Americans.  My Game One starter would be C.C. Sabathia.

Neyer discusses wOBA and some Sabermetric philosophy

From the “stuff we’re reading” file: Rob Neyer over at ESPN has an extended post about embracing wOBA (a creation of Tom Tango) and a few other issues worth commenting on.  Neyer’s one of the more Saber-savvy gentlemen over at ESPN and he’s worked extensively with Bill James, and it’s cool to see these concepts creeping into the mainstream. (h/t: The Book).

But don’t stop reading there.  Rob goes on to discuss another issue which are worth a little discussion.  The first, and more important, is the issue concerning Baseball Prospectus and the fact that while they have been the industry gold standards for a while in terms of player metrics, their flagship stuff like PECOTA, VORP, and WARP are proprietary.  They do have the right to keep whatever they so desire under their hat (and I, for one, pay my annual subscription dues to read it), but Neyer brings up an interesting hypothetical:

But that science-vs.-enterprise dynamic can be tricky. The methodology behind BP’s metrics is not, to my knowledge, peer-reviewed. If one or two people make a big mistake, would anyone else know? Now, let’s jump ahead and say that two or three years down the line, the big mistake was discovered internally. Would BP announce to the world that all those numbers over the previous three years had been wrong? Or would the guys running the show decide that the loss of credibility (and potentially, revenues) isn’t balanced by the loss of integrity?

One of the charming things about Sabermetrics is that it’s grown up as mostly a field of amateur hobyists who have too much free time.  If I screw up on something here at StatSpeak, I know.  (It’s the only time I ever get comments!)  But that’s the beauty of peer review, and I love that sort of dialogue for its own sake.  The luxury that I have is that I can screw up and it means nothing more to me than a red face over having blundered.  For what it’s worth, I do trust that the people at BP are smart, dilligent people who double-check their work and each other’s work, but they are humans and humans make mistakes.

Rob points out that “Science works best under blue skies, with little thought of green.”  And here’s a philosophical point that is becoming very real in Sabermetrics.  Sabermetrics has started to see that green influence creep into it.  BP started out as a buch of people throwing around ideas on Usenet.  Now, they’ve got a business model and some of those same folks have gone corporate.

In fairness to BP, they wouldn’t survive if they weren’t putting out something good, and while we may not know the exact machinations of how something like VORP works, we have a general idea through what they have said and the concept makes sense.  Plus their stuff generally passes the “smell test” and things like PECOTA projections have been shown to correlate pretty well with actual performance.  It’s not that I worry about their stuff being bad/wrong/misleading, it’s that I worry that the science part of Sabermetrics might stagnate.  When I developed OPA! (my fielding system), I was writing the syntax that did all the calculating at the same time that I was posting the articles on it.  I got some feedback through that process that I incorporated into the system.  OPA! is a better measure because of it.  Imagine for a moment that PECOTA were open-source.  Someone out there would probably figure out a way to make it better.  It’s not.  But, that’s the trade off that you make in a closed source system.  Nate Silver, who probably has poured hours on end into that system (as well as any of the BP folks who have helped him) certainly would see no return on his investment of time, at least financially.  Would he have created the system to begin with had that incentive not been there?  I don’t know Nate, and I don’t know what his answer would be.  Maybe he would.

This is a grown-up moment for the field.  This is a philosophical turning point and one that, like most philosophical issues, doesn’t have an easy answer, or even an real answer.  Do we want to be scientists in the strict sense or are we OK with the idea that some people will keep a few secrets and charge a few dollars to see them?

Introducing the BaSQL wiki

Picking up where I left off in my SQL tutorial series, I’ve started the BaSQL wiki, which is meant to be a (hopefully, the) central clearinghouse for information on how to commit sabermetrics using a relational database.

The wiki is editable by anyone, but please remember: it’s supposed to be browsable documentation. If instead you have questions that need answering, there is a support group forum that goes along with the wiki. Go there and ask as many questions as you like.

This is only the start; I will be adding more pages as time goes on, and if you feel you have something to contribute, please add some pages (or edit existing ones) yourself.

This doesn’t rule out any further articles, by the way.

Finding the breakout

From the stuff we’re reading file: Michael Lerra over at THT has an article on finding the next breakout pitcher.  He likes Shaun Marcum, Dustin McGowan, and John Danks.  And umm… Paul Maholm.  Actually this is the type of work that I really look forward to in the off-season.  Predictor systems usually predict just about everyone to basically pick up where they left off last year with maybe an adjustment for age or similar players.  And for what it’s worth, usually everyone picks up about where they left off last year with a little adjustment for age.  The trick is to find the guy who will break out.  I’d trade all the forecasting systems in the world for one that could locate the breakout guys.

This is the world I’m bringing a child into: Mrs. Cutter is pregnant, and I have to wonder about a world where just about anything that the human race is capable of doing is available in game/reality show format.  The Pittsburgh Pirates just signed two pitchers who won a reality show in India.  That may or may not be a prize to sign with the Pirates, but…

Ah, so we meet again…

A trivia question: Over the last 25ish years (since 1981), what batter/pitcher combo has faced each other the most?  As you might expect, these are two gentlemen who played more than 20 years each (and both premiered in the same year), both spent their entire careers in the same league, but were never teammates.  The names are at the end, but if this is any hint, they faced each other 154 times over their careers.

So after the 35th time, who really had the advantage?  Is it the pitcher who now “knows how to get the batter out?”  After all, he’s had the experience to see what a batter will swing at and what he won’t.  Then again, maybe the batter has the advantage.  He’s had the experience to see what the pitcher throws and can figure out his patterns.  Indeed, there’s always talk that when a player is traded to/signs in a new league, he will have a period of adjustment, owing to the fact that he likely hasn’t faced many of the batters/pitchers that he will now be facing.  I’ve actually heard it all four ways, that a batter will benefit from/suffer for his first foray into a new league (because the pitchers haven’t seen him/he hasn’t seen the pitchers before) and that a pitcher will benefit from/suffer for his foray into a new league (same logic).  What gives?

Well, let’s look at what really happens.  I took all the Retrosheet play by play files from 1980 to 2007 and put them into one big file.  (My computer currently hates me.)  I sorted them into chronological order and then numbered the different confrontations between batter and pitcher.  I dumped everyone who appeared in the 1980 season from the data set.  Johnny Bench faced Tom Seaver in 1980, but certainly, that wasn’t the first time that they’d seen each other (although my data set would have considered them to be just introduced).  In order to maintain the intergrity of the sample, they had to go.

Then I coded for whether the plate appearance ended in the batter being on base (even if that meant an ROE).  My first thought was to run a simple OBP broken down by the number of times that the two had faced each other.  But then in order to get to a point where a player had been around long enough to face a pitcher 20 times, he was probably a different class of hitter than the guy who only only got marginally introduced to a couple pitchers.  Same logic goes for pitchers who stick around.  So, I had to calculate what the expected OBP of the plate appearances in question might be.  I calculated both the player’s yearly OBP and the pitcher’s OBP given up (plus the league OBP for the year).  To make sure I wasn’t getting any .500 OBPs from someone going 1-for-2, the pitcher and batter had to have logged 250 PAs in the year in question.  This had the nice side effect of getting rid of pitchers hitting.

You can calculate what the expected OBP of a particular batter/pitcher matchup is by converting OBPs into odds ratios (OBP / 1-OBP), and then using the formula.

(batter OR / lg OR) * (pitcher OR / lg OR) = (expected OR / lg OR) 

Once you have the expectation, you can turn it back into an OBP rather easily (OR / (OR + 1)). 

Then, it was simply a matter of watching what happened when I compared what would have been expected to what actually happened.  I fumbled around with some binary logit models to see what happened, and they generally showed that as a pitcher and batter faced each other more often, the advantage slowly worked its way in the batter’s favor, but I think that the graph shows the effect a little better.  On this graph, numbers above zero mean that the pitcher has the edge.  Below, the batter has the edge.


pitcher batter learning.JPGIn the first meeting between batter and pitcher, the pitcher had a 7 point advantage in OBP.  By the time of the second meeting, that advantage was almost entirely gone (down to 1.5 points), and then by the third meeting, the outcome was most likely to be even-up to expectations.  Following that, you can see that the graph jumps around a little, but the general trend-line is downward until about 35 PA’s.  After that, the graph just gets really unstable.  My interpretation is that means that we have something of a real effect, although not a very coherent one, and the fluctuations may have to do with selective sampling and a decreasing number of pitcher-batter pairs that have met 30-something times.

There’s certainly a trend line to be had, and it certainly looks like it points toward the batter having the edge as he faces a pitcher more often, and by meeting #35, the magnitude is 13 points worth of OBP.  At first, the pitcher has the element of surprise, but the pitcher must strategize on how to remove the batter from the batter’s box with a new strategy each time, while the batter himself must simply react to what’s thrown at him.  At first, the batter has nothing to go on, but if he can learn the pattern (and it looks like he does) he can react better.

So for a short period of time, an exotic pitcher does have the advantage.  But not for long.  That advantage wears off pretty much the second time through the lineup.

Trivia answer: Greg Maddux has faced Barry Bonds* 154 times over their careers.  Second place on the list, incidentally, also belongs to Greg Maddux, this time paired with Craig Biggio (140).

World Famous StatSpeak Roundtable: November 21

This week, the roundtable is pleased to welcome Mr. Jayson Stark who writes about baseball for an a small, Connecticut-based media organization called E-espian.  Jayson’s written a few books here and there, and even wrote the foreword to Eric’s book.  (Eric’s cool like that.  He knows everyone.)  He joins us to discuss Nick Swisher, Dice-K vs. Jon Lester, free agents, and our early favorites for Rookie of the Year   Why is the Roundtable on Friday?  Because sometimes schedules explode.  Look for next week’s roundtable at it’s usual Wednesday time.

Question #1: Why did Kenny Williams sell low on Nick Swisher? Did Kenny Williams sell low on Nick Swisher?

Brian Cartwright: I think Williams was disappointed with what he got, and was more willing to dump Swisher now for what he could get than wait for Swisher’s value to go up. Swisher did have a terrible year – his wOBAs for his years in the majors were .365, .341, .377 and .368, all above average for outfield, but then .317 in 2008. Williams is trying to put a winning team on the field, and didn’t think Swisher would be part of it.

Colin Wyers: I’ll be honest. As a Cubs fan I tend to read a lot of the Chicago sporting press and I continually wonder how the White Sox hover near contention at pretty much any time. About the nicest thing I can say about Williams without lying is that he’s smarter than Hawk Harrelson, which is kind of like being wetter than sand.

The White Sox apparently decided that Swisher wasn’t a center fielder, and given a choice of Quentin, Dye, Konerko and Swisher at the corner spots it’s pretty clear who Ozzie Guillen will play. And the White Sox have always been ones to put on airs about being a small-ball team despite having their greatest sucess with the longball, so some room needed to be made for some speedy, ineffectual hitting in the Podsednik mold. Hence the Swisher deal.

Eric Seidman: Statistically, the hot topic involving Swisher is that he was unlucky in 2008, and it is hard to debate that his actual BABIP was MUCH lower than where it was expected to be at.  There is no real way he will repeat what he did last year, and his true talent level is projected at around an .800 OPS with 23 HR.  That isn’t really isn’t tremendous for a first baseman, but Swisher is still a nice piece for a team.  Williams apparently clashed with Swisher’s attitude problems, and that, coupled with his disappointing 2008 season, was enough of a reason for him.  I don’t agree with it, but who knows, maybe these three prospects will help the White Sox in the long run.

Pizza Cutter: Kenny Williams got three magic beans for Swisher.  They might grow into beanstalks to the sky, but probably not.  (Jack at least got six.)  I took a look at Swisher’s FanGraphs page and I’m really rather confused.  To look at Swisher’s peripherals, it looks like his numbers are kinda vibrating around a talent level.  But take a look at his swing numbers.  Swing diagnostics are usually pretty stable from year to year, but Swisher’s aren’t.  In 2006, he had a sudden upsurge in swinging (and a downward slump in making contact).  In 2007, he started swinging less at pitches in the zone, but then made contact with a lot of pitches outside the zone.  Sounds to me like he re-engineered his approach and what he was looking for at the plate.  Then in 2008, he started swinging at even more pitches out of the strike zone (and making contact with many of them), while swinging at fewer pitches in the strike zone.  It seems like Swisher is consciously trying a new approach.  It might be that Swisher has become a tinkerer.

He also was victimized by a low BABIP, despite raising his line drive rate.  Kenny Williams probably bought high on Swisher and sold low.  His 2008 batting average was abysmal (.219), but his OBP was still a coomparatively healthy (.332).  I wonder if Kenny Williams just got scared by .219.  Maybe Swisher’s newfound approach doesn’t jive with what the White Sox are trying to teach.  In any case, Swisher is a decent player who is willing to stand either in the outfield or near first base.  The White Sox have holes in both places…

Question #2: Who are your favorite rookies for 2009?

Jayson Stark: Can’t say I’ve thought about this a lot. I might even name a couple of guys who aren’t technically rookies. But here’s a list of five: Matt Wieters (Orioles), David Price (Rays), Max Scherzer (Dbacks), Tommy Hanson (Braves), Cameron Maybin (Marlins).  And here’s a sleeper: Bobby Parnell (Mets). Lit up the gun in the Arizona Fall League.

Brian Cartwright: The Cardinals selected 3b Brett Wallace from Arizone State with the 13th pick in the 2008 draft. He hit over .400 his last two seasons in college, then .337 in the minors. I project him at 303/369/503, the best of any minor league player this year. There’s debate whether he can stay at 3b with his stocky body, but he can be as good a hitter as Troy Glaus and ten years younger. I expect him to be in the bigs before the end of the year. Matt Wieters would be my runner up. He’s going to be a very good hitter as well, I project 294/373/487

Colin Wyers: If you’ll forgive me a homer choice just this once, I’ll say Jeff Samardzija. The guy’s numbers seemed to keep getting better as he was promoted (one could say rushed) through the minors. Damnedest thing. He’s still very raw and could use work on his breaking stuff (okay, a lot of work), but it’s electrifying to watch him throw the ball.

Eric Seidman: For a legit rookie, Matt Wieters of the Orioles and Lou Marson of the Phillies.  As a non-rookie about to partake in his first full season, my boy Max Scherzer of the Diamondbacks is primed for a very solid campaign.

Pizza Cutter: I could take the easy way out and say David Price.  You do have to appreciate lefties like J.A. Happ (if only for the retro-sounding name) and David Purcey for the 9+ K/9 IP at AAA and sub .700 OPS allowed at AAA.  Should be interesting to see what happens with them over the 2009 season.

Question #3: Since Daisuke received plenty of Cy Young Votes and Jon Lester received NONE, PLEASE make a case for me why Daisuke had a better season.

Jayson Stark: Hey, Dice-K wasn’t exactly Adam Eaton, you know. These voters can be a traditional group, and he did go 18-3, with one of those losses on the last day of the season in a 4-inning tuneup. He had the better strikeout rate. The Red Sox had a better record when he pitched. He made nine starts in which he gave up three hits or fewer. He allowed a lower opponent batting average. And he had a better ERA and ERA-plus. Do I think Dice-K clearly had a better year than Lester? No. But the Cy Young voting is different than MVP voting because the ballot only gives you a chance to vote for three pitchers. And clearly, some of these voters base too much of their vote on W-L record. But if there were more slots on the ballot, Lester would have gotten plenty of votes himself. And deservedly.

Brian Cartwright: Well, of course, Daisuke is much more famous. Everyone knew his name before he ever threw a pitch in North America, then he goes 18-3. Destiny fulfilled. This was only Lester’s first full season in the majors, so most of the BBRAA probably don’t know who he is yet. According to Baseball Prospectus’ pitching runs above average and above replacement, the two were virtually identical, Daisuke 26 & 75, Lester 25 & 79.

Colin Wyers: According to tRA and FIP, Josh Beckett outpiched both of them. I dunno, they don’t ask me to vote.

I just want to take this opportunity to note that this is way too much attention paid to balloting whose results are released on the LIME GREEN PAGE OF DOOM.

Eric Seidman: Only legit way to make a case for Daisuke over Lester involves ERA, but Lester’s advantage in just about every other category, including FIP, is more significant.  It is understandable for him to receive votes based on wins, but Lester went 16-6 as well, so it isn’t as if Daisuke’s 18-3 is THAT much better.  Lester likely would have finished fourth on most ballots, but he should have been ahead of Daisuke as he was clearly the Red Sox ace this season.

Pizza Cutter: Dice-K is Japanese.  He also had a lot of wins.  He’s also short.  It’s important to be short when going for a major award.  Just ask that gritty, plucky, 5’9″ Red Sox guy Dustin Pedroia.  For what it’s worth, Dice-K’s ERA was lower than Lester’s (but Lester had the lower FIP).  They were almost even up in WPA, although Lester beat Dice-K in WPA/LI.  Dice-K struck more gentlemen out, but Lester had the better K/BB ratio.  In other words, Dice-K, won all the stats that make you look good if you ignore context.  And he’s Japanese.

Question #4: Which under-the-radar free agent would best indicate that the team signing him has a clue as to what they’re doing?

Jayson Stark: Great question. It might be easier to pick the ones who indicate teams have NO idea what they’re doing (Oliver Perez, for instance). But to answer your actual question, I think it’s smart to look at those little moves, because in the end, they do as much to help good teams win as the big splashes. (Does the team that “wins” the offseason EVER win the World Series?) I guess in terms of bats, I’d go with Raul Ibanez. Total pro who almost slugged .500 in a pitchers’ park last year. And if I had to pick an arm, I’m going to pick a name you probably won’t hear anywhere else – Russ Springer. Only reliever in baseball to rip off three straight seasons of 70 games or more while allowing fewer than seven hits per nine innings.

Brian Cartwright: Ha! That’s assuming that once you get past Manny and Teix that there’s anyone on the list I’d want to sign. Honestly, I think I’d have better luck with the minor league free agents. Last year Pittsburgh picked up Doug Mientkiewicz, and I think they’d do well to sign him again. He hits about league average, has a good glove at first, filled in at 3b and rf, and is a good ph. Not many dollars, won’t play everyday, but will contribute.

Colin Wyers: Barry Bonds.

Or if for some reason that doesn’t work out for you, I’ll say Jason Varitek. “But OMGZROFLCOPTER, he batted .220 with only 42 RBIs!” I can hear you say. “Get him to the glue factory!”

First of all, we don’t put people in glue factories. Second, Tek was below-average this year but probably a win or so above replacement level. And unless we overweight one year of performance, we should expect him to improve a little next season.

Eric Seidman: Well, Jeremy Affeldt was mighty nice on the Giants behalf given his vast improvements this past season.  The Phillies signing Doug Brocail to a small deal would show promise as well, especially with Tom Gordon’s departure.  Additionally, re-signing Scott Eyre was a good move for the Phillies.  Joe Beimel would also be a nice lefty specialist for several teams currently serving as his suitor.

Pizza Cutter: I was going to say Jeremy Affeldt, but then he signed with the Giants.  I’ll still say Jeremy Affeldt, so apparently, the Giants have a clue.  Affeldt strikes a lot of guys out, and while he’s left armed, he’s not horrible against righties.  He’s the best non-closer reliever on the market (or was) so teams (er, team) won’t have to pay him “closer” money.  I could also see the team that passes up Manny Ramirez and instead invests in Pat Burrell being a good candidate for “has a clue.”

2009 Batter Projections

Not a lot to write today.

I have finished the work on my batter projections. There are 2447 projections currently available. Click here for an xls version or here for a csv version. I do need to finish filling in primary defensive position for many of the minor league players. I will repost the files when that is completed.

  • It includes all batters appearing in 2008 at Class A or higher.
  • For those batters, all levels that they played on in their career are included, including college.
  • Stats are park neutral. To find how a batter would do for a given team, that team’s factors would need to be applied.
  • Also neutralized for level, including college, US minor leagues, and Japan.
  • Projection is then regressed to the highest level the batter appeared at
  • Also added an age adjustment

All these factors are empirically derived from comparing sets of batting data, described in detail in earlier posts.

Now I have 10 years of projections and actual performances to peruse, looking for any patterns, and analyzing the error distribution of actual data compared to the projections. This will allow me to give reliability scores and curves, like PECOTA’s percentile breakdowns.

Probably this wekend I will post a list of the 2008 batters who most over or under achieved compared to the last projections, and are therefor the most likely to regress back towards the middle.






Get every new post delivered to your Inbox.