Sabermetrics… without numbers
April 22, 2009 5 Comments
It seems like a contradiction in terms. How can someone do Sabermetrics without numbers? After all, are we not the great digitizers of the game of baseball? Can one do data analysis without… data? The answer is that yes, Sabermetrics is possible without numbers. It just takes a little different understanding of what data are and what they can be.
Most of Sabermetrics, thus far, has focused on a basic model of counting up the frequency of some event happening (perhaps norming it to some league expectation) and either counting it for its own sake or checking to see if it correlates with some other event which we have diligently counted up. I have nothing against this approach (in fact, I use it a lot!), so long as data are systematically collected in an un-biased (i.e. scientific) manner. But counting things up and turning everything into numbers isn’t the only way to go about research.
Consider how you make a decision on whether to go to a new restaurant. Let’s assume that you have a group of fairly open-minded eaters. My guess is that you don’t have a metric that takes the average iterm price on the menu, the distance to the restaurant from your mom’s basement, the cubed root of the waiter’s salary, etc. Instead, you harken back to hearing your friend Larry say that he went to the restaurant and he really liked it. In other words, you got a scouting report. Larry’s scouting report might not have been numerical, but since you trust Larry’s judgment on such things, you suggest the new place. It may end up as the best meal you’ve ever had. It may be awful. You’re about to find out… based on a sample size of one and a poorly defined idea of what “good” is. He might give you a “8 out of 10″ (it’s a number!) type of rating, but how did he come up with 8? Is that a good measure of how much you’ll actually like the restaurant?
(Would you like a slightly more cynical example? Remember that paper that you wrote back in college that got a C-? It ruined your semester and your GPA and you’ve hated that professor ever since. I’m sure I’m that professor for someone out there. How did s/he come up with a C-? We use qualitative judgments all the time… sometimes, like in the school example, with consequences that can make or break someone’s entire life-course!)
In reality, we operate on non-numeric data a lot in life, especially when dealing with the completely unknown. It’s hard to collect systematic quantitative data about everything! Most data of this sort comes in the form of descriptions and words… not regression equations. The thing that most people don’t know is that this sort of data, called qualitative data, can be analyzed. (A phonetic problem quickly arises… the word “qualitative” looks and sounds a lot like “quantitative.” Therefore, for the rest of the article, I will use “qual” and “quant” as short-hand.) Qual data, when systematically collected and analyzed (and yes, there are ways to do that) can lead to some very interesting results.
A common type of qual data analysis is called content analysis. Suppose you had an idea for a research question that you wanted to ask or a topic that you wanted to study. Suppose that it was question that no one had ever really researched before. (You’re so creative!) You have an idea, but you have no idea where to really start. Solution: you do some reading about whatever’s been said on the subject. You go to the oracle of all knowledge (Google) and type in a few keywords. You read everything that comes up on the subject. Slowly, you begin to understand how other people have conceptualized the issue in the past. This provides you a groundwork for studying the issue further, perhaps with quant data. At least now you know what to count. A while back, I was looking for how to quantify “hard-nosed” players and used a similar method. I looked for what characteristics were most-often mentioned for players who were considered “hard-nosed” and found that mostly the term is used to describe players who like to run into things.
The other common type of qual analysis is thematic coding. Let’s say that you have qual observations on several pitchers. Was the person who filled out the report generally positive or negative about the pitcher’s fastball? Did he mention anything about his mechanics? Did this guy have the potential for a “filthy” slider? Now, let’s wait a few years and see whether those observations predict to data that we can gather later on. But let’s take all the scouting reports filed by that guy. And let’s look at all the pitchers whom he rated, both the studs and the duds. Did his predictions actually pan out?
The details of how this is done would take a much longer piece (actually, a course in qual data analysis), but in theory, it is just an engineering problem to actually conduct this type of study. It would provide a systematic look where scouting reports are valid and where they are not. If Sabermetrics claims to be a science, then it must allow for the possibility that these scouting reports are powerfully predictive. (And it must allow for the fact that scouting as a profession is functionally useless.) The fact that scouting reports are non-numerical in nature does not mean that they can’t be analyzed. It just means that they need to be analyzed in a different way.