General Theories on the Study of Baseball
March 29, 2007 20 Comments
Greetings from the last man to join the “party” here at Statistically Speaking. I apologize for the delay in my arrival, but you all can be certain I will be active enough to make your head spin once I get fully settled into the MVN blog circuit. I have a rare gift for somehow being loud even thoguh sound is impossible in a communication medium that is limited to text and color, so that may or may not be a good thing for some of the readership (heh). I was so loud that although I’d never written an article for the Hardball Times, my works in Sabermetrics were on their radar screen over there and they recommended me for this “job.” I was as surprised as anyone.
I’m going to start with a small warning. I (grudgingly) play a little roto, but my posts will not be particularly geared toward a specifically “Fantasy Baseball” mindset as it does not hold my interest to talk about projecting RBI totals, or E counts or Holds or any of the other woefully inadequate measures of performance used in even the most enlightened of leagues. I am interested in learning about this object of our unique obsession purely from the perspective of a dispassionate observer (whenever possible) assimilating all of the available data and drawing what conclusions he can defend.
I’m going to try to focus my posts here on those elements of my research – and the research of other big names in the field who have caught my eye – that can fully explained without the use of higher-level math (anything beyond the normal scope of a high school math program), but if some methodology is used that requires an advanced mathematical topic, I’ll say so explicitly in the title of that entry.
For those of you who aren’t familiar with me, my name (as implied in my user name) is Matthew Souders, although I generally go by SABR Matt online. I’ve been following baseball since 1992 (I was 10 years old when I saw my first big league game and I was hooked immediately). I lived in Seattle at that time and the Mariners just happened to be developing into an interesting team to root for at that moment in their history, packing exciting young players like Randy Johnson, Ken Griffey Jr, Jay Buhner, Edgar Martinez, Tino Martinez, Chris Bosio and Bret Boone and inhereting the mean streak of their new manager (Lou Piniella). I’ve been a rabid Ms fan ever since, which make for some interesting conversations with Sean, who is unfortunately an Angels fan (seek help Sean – seriously, you folks rooting for the Angels must have some kind of strange psychological problem).
I got interested in statistics thanks to looking at baseball cards – my first big league game was a give-away day, a pack of 25 Mariner baseball cards in a nice little collector’s book, which lead me to a short-lived card collecting phase. By the time the 1993 season was over, I was copying statistics from the internet (yes, I was online way back then!) and simulating seasons on a hilariously cheesy (by modern standards) baseball platofrm for my Super Nintendo (LOL). By the time the Mariners were celebrating in a delerious mass in October of ’95, I had my first baseball encyclopedia and was reading voraciously about the history of baseball. By the time I headed off to college (in 2000), I was building really simple player evaluation methods (I laugh at the first analysis I ran now when I look back on it).
This is now my seventh season as a sabermetrician, each season I learn a little more and each season I get a little further from accomplishing my goals (the more you learn about baseball, the more you realize you don’t know anything)! After that long premable (sorry, I tend to be a bit wordy when making introductions), I want to talk about my approach to problem solving as it applies to baseball and about what I view as some serious deficiencies with some other approaches.
The Scientific Method
The study of baseball is not unlike the study of meteorology (my second love – I am finally close to earning a degree in Meteorology and moving on to grad school) in that both require a scientific approach and deal with principles of chaos and random selection. I’ve used Monte Carlo simulations, for example, in the analysis of wind speed fluctuations over time AND to generate Win Probability Added statistics for baseball games in the play by play era.
If someone describes himself as a Sabermetrician and that someone does not have a primary career path linked to the sciences, you should immediately have doubts about his or her ability to think in the manner of a scientist and therefore about his relevence. There is exactly one exception to this point, and his name is Bill James. He only stands out as an exception because he has an unusual gift for intuition regarding baseball that leads him to scientifically incorrect methods that produce bizarrely accurate results.
The scientific approach starts with an observation. Which means you can’t be a sabermetrician if you don’t spend a LOT of time looking at the data. It’s a really fascinating history if you take the time to study it and look for patterns, but a lot of people are tragically short on patience and skip this step, choosing instead to focus only on the most recent seasons and the data that’s right in front of them. Doing this can lead to erroneous conclusions (the most famous of which was the initially bold statement by Voros McCracken that pitchers had ZERO control over batted balls, which he has since realized was fatally inaccurate).
When I come up with an idea about how to answer a question that’s been nagging me, the first thing I always do is define specifically what my problem is.
Do baserunning habits change based on the run scoring environment?
Then I propose an answer.
While the frequency of positive and negative events changes, baserunning aggression does not change because an equillibrium has been found through years of professional play that balances risk and reward and the overall abilities of baseball players have not changed much over the years.
Then I define exactly what kind of data I need to examine to test my hypothesis.
In this case, a simple test was devised using play by play era information about the likelihood that a lone runner on first (with no one out) would advance to third on a single. The simplicity of the starting base/out state eliminates the interactions between multiple runners who might interfere with each other, and removes the number of outs, which is another variable that can impact baserunning, allowing me to focus on aggression and only aggression.
Then I examine the data and make a decision about my first guess.
This small study revealed that I was incorrect about baserunning aggression being independent of run scoring environments. In fact, there is an abnormally strong correlation (roughly 0.94!) between the rato of RS/G to the alltime average RS/G and the ratio of the odds that a runner will take third base on a single to the all time average odds of that result. This suggests that baserunners take more risks when a single run might decide a game. Interestingly, however, the odds of being thrown out attempting third base were only slightly higher in lower run scoring environments, which suggests that the pressure of a low run-scoring environment forces the development of players who are good baserunners and improves that aspect of the game.
Correlations, Confidence, and Lines of Best Fit
Since sabermetrics is inherently a statistical pursuit, almost everything done in the field tends to come down to statistical tests of significance and statistical methods for defining empirical relationships in the data. I believe there is a severe over-reliance on correlation in baseball today. A rather frustrating (to me) example of correlation run amok is this new concept that the rate at which pitchers allow HR is defined entirely by the number of outfield flies they surrender. We didn’t learn our lesson from the past overconfidence of McCracken? Don’t get me wrong. I’m a strong believer in the proper utilization of DIPS theory to the analysis of both pitchers and team defenses. My own methods rely heavily on DIPS to separate pitching from fielding. It no more accurate, however, to claim that pitchers have no control over their HR/F rate than it is to say that they have no control over their BABIP.
The reason things like this frustrate me is simple: rather than actually using the data they have, many sabermetricians continue to fall into the trap of believing that baseball analysis can be simplified down to a series of correlations that will explain everything. The problem gets compounded when they stick to their guns in the face of examples that disprove theories, or in the face of straight logic. How can it be true that all outfield flies are essentially created equally? Do we really expect that there aren’t some pitchers with the ability to induce poor contact (that results in a series of easy fly balls), and some pitchers who get hammered whenever they leave a ball up enough to be hit in the air?
I said above that a lot of time must be given to studying the data, but you’ve got to stop and think about what you’re saying when you make assumptions or strongly worded conclusions based on that data. I’ve been guilty all too many times of defending bad ideas in my youth, a practice I’ve worked hard to curtail. I only wish I’d started thinking like a scientist sooner – I might be further along now if I had.
Do you know how many times I find someone noodling with the data and trying to come up with a statistic withuot the foggiest idea what it is they’re trying to measure? The easiest way to improve someone else’s metric is to ask them “What are you trying to show here?” Most of the time, home cooked statistics have no direct connection with a real world event, and no statistic can or should be used unless (a) you are actually measuring something real and tangible and (b) you know exactly what that something tangible is!
OPS is evil. It measures nothing at all. It’s junk science that happens to kinda-sorta give the right general impression while being easy to calculate. The mixed denominators and incorrect proportions in OPS gnaw at me like nothing else. ERA is not a good measure of pitching skill (too many things embedded within that are not under the control of the pitcher) but it IS a good statistic as long as you know what it means. The original DIPS ERA is a bad statistic. It measures nothing. It’s just the sum of three componants of a correlation and bears no resemblence to real world data.
The biggest offender is the traditional park factor. A little unit analysis reveals immediately one of the biggest problems with park factors. They are unitless! A park factor is the ratio between the runs a team and its’ opponents score at home and the runs they score on the road. In otherwords it’s (R / R). That makes no sense whatsoever. Ballparks do not have a proportional (scalar) impact on run scoring. If a league scored runs at double the rate they are scored today, do we really think Coors Field would be twice as bad (relative to the league) for pitchers? A park factor is attempting to measure the park’s real influence on scoring. The more you play at any given park, the more it will influence you. A proper analysis of parks should begin not with a ratio factor but an additive factor with units like runs per game.
Stop and think before you run around mixing numbers together and rating players. What are you trying to measure and how does your metric accomplish said measurement?
I’ll save further comments for later articles, but hopefully this will give you a sense for what to expect from me in the future.