December 23, 2008 6 Comments
So now it’s not enough for us to have the old, hackneyed stats versus scouts debate (hint: it’s almost over, and everybody won) – now, we can have internal debates between academic and hobbyist sabermetrician.
Of course, it’s possible that the academics don’t want to be considered sabermetricians. (Although the name of his website may have confused a few on this point.) The (or at least a) premise of the debate (you can read more here and here) is that neither community is paying enough attention to what the other is doing.
To the extent that I have a set of cheap aphorisms that guide my life, one of the big ones is “tend to yourself, and let others tend to themselves.” So I can’t really make academics start reading the hobbyist stuff, but I can look over the academic stuff and see what there is for the taking.
Fair warning: I’m a liberal arts major who didn’t finish college, who works in television production. I’m mostly self-taught when it comes to the programming and statistics stuff, and my stats textbook comes from the book rack of the Salvation Army. So take what I say with whatever grains of salt you like.
Basically what I’m going to do here is sift through what academic papers about baseball I can find and locate a few of interest to sabermetricians. It should be noted here that a lot of academic research on baseball is on topics that don’t intersect with sabermetrics – looking at baseball from a media effects or historical perspective. These are actually the ones I’m more familiar with – unfortunately I can’t find my course notes from History of Sport in the US, but I recall some very fascinating papers on the history of baseball. (One I believe was called “Hands Were All Out Playing,” but for the life of my I can’t find it.) That’s not the sort of paper I’m going to be dealing with here, though. Others are strictly medical or economic, without intersecting with any of the areas I consider sabermetric. I won’t be examining those, either.
On other point – I’m only going to be looking at articles that are, well, free. I’m not paying $20 per day – per day! – just to read one article.
I am skimming these, to prepare a reading list – there’s nothing to say that I won’t come back later and to a more in-depth look.
- Improving Major League Baseball Park Factor Estimates
- This seems like an interesting paper, although a lot of the interesting math seems to not be spelled out. (The dummy indicators for offensive and defensive strength are something I’d like to explore in more detail.) I dislike the use of Basic Runs Created, of course (is anyone surprised?)
- BAYESBALL: A BAYESIAN HIERARCHICAL MODEL FOR EVALUATING FIELDING IN MAJOR LEAGUE BASEBALL
- This is a fantastic-looking paper about Shane Jensen’s SAFE fielding evaluation model.
- Slugging Percentage in Differing Baseball Counts
- Seems interesting. The real reason I bring it up? They actually did their own scorekeeping for the purposes of the study, and used… well, they say "Data is collected from 1260 MLB games played between March 20, 2008 and April 20, 2008." Retrosheet lists 282 games in that time period, so…
- Did Steroid Use Enhance the Performance of the Mitchell Batters? The Effect of Alleged Performance Enhancing Drug Use on Offensive Performance from 1995 to 2007
- JC Bradbury doesn’t seem fond of this study. Neither am I – RC27? Not adjusted for league context or park factors? Ugh.
- A Markov Chain Approach To Baseball
- As a run estimation maven, this is right up my alley. The math seems a bit intimidating to me, however.
This isn’t meant to be comprehensive, just a starting point. Things I’ve learned while writing this list:
- Academic papers are expensive. Very expensive. I mean wow-factor expensive – for the price of some papers you could buy the whole THT annual. [Full disclosure – I’m published in the Hardball Times annual.]
- Even ignoring expense, academic papers are pretty inaccessible. I served in the Marine Corps for five years doing public affairs work, and so I have a decent handle on the difference between information you’re trying to get out and information you’re authorized to use lethal force to prevent the unauthorized access of. Academic research on the Internet really feels like the latter category.
- Runs Created appears to be the run estimator of choice among academics. Please shoot me now.
Mentioning how dryly they all seem to be written may be belaboring the obvious, but I can see why the casual, yet interested, baseball fan would rather read Joe Sheehan or Rob Neyer. I can see why I would, too.
What I’ve come away from, if anything, is that it’s possible that there’s some very good sabermetric research going on in academia. There’s also some stunningly mediocre stuff – the Mitchell Report study is laughably unusable, and the slugging percentage study looks like a casual afternoon’s work once you have a proper Retrosheet database.
One thing to bear in mind is that in academic contexts, peer review likely means "other economists/physicists/etc." That’s fine, that’s great, that’s wonderful. But having the baseball stuff peer-reviewed by someone who knows baseball wouldn’t be the worst idea, either.
I would also argue that – as it stands now – it is much easier for an academic to become well-versed and even involved in the sabermetrics community than it is for sabermetricians to interact with academics studying baseball. Academic inquiry seems to happen behind walls; there are walled sabermetricians as well, but a surprising amount of research is happening out in the open. Maybe that’s not quite as good as peer review, but peer review isn’t a cure-all itself.