General Theories on the Study of Baseball

Greetings from the last man to join the “party” here at Statistically Speaking. I apologize for the delay in my arrival, but you all can be certain I will be active enough to make your head spin once I get fully settled into the MVN blog circuit. I have a rare gift for somehow being loud even thoguh sound is impossible in a communication medium that is limited to text and color, so that may or may not be a good thing for some of the readership (heh). I was so loud that although I’d never written an article for the Hardball Times, my works in Sabermetrics were on their radar screen over there and they recommended me for this “job.” I was as surprised as anyone.
I’m going to start with a small warning. I (grudgingly) play a little roto, but my posts will not be particularly geared toward a specifically “Fantasy Baseball” mindset as it does not hold my interest to talk about projecting RBI totals, or E counts or Holds or any of the other woefully inadequate measures of performance used in even the most enlightened of leagues. I am interested in learning about this object of our unique obsession purely from the perspective of a dispassionate observer (whenever possible) assimilating all of the available data and drawing what conclusions he can defend.
I’m going to try to focus my posts here on those elements of my research – and the research of other big names in the field who have caught my eye – that can fully explained without the use of higher-level math (anything beyond the normal scope of a high school math program), but if some methodology is used that requires an advanced mathematical topic, I’ll say so explicitly in the title of that entry.
For those of you who aren’t familiar with me, my name (as implied in my user name) is Matthew Souders, although I generally go by SABR Matt online. I’ve been following baseball since 1992 (I was 10 years old when I saw my first big league game and I was hooked immediately). I lived in Seattle at that time and the Mariners just happened to be developing into an interesting team to root for at that moment in their history, packing exciting young players like Randy Johnson, Ken Griffey Jr, Jay Buhner, Edgar Martinez, Tino Martinez, Chris Bosio and Bret Boone and inhereting the mean streak of their new manager (Lou Piniella). I’ve been a rabid Ms fan ever since, which make for some interesting conversations with Sean, who is unfortunately an Angels fan (seek help Sean – seriously, you folks rooting for the Angels must have some kind of strange psychological problem).
I got interested in statistics thanks to looking at baseball cards – my first big league game was a give-away day, a pack of 25 Mariner baseball cards in a nice little collector’s book, which lead me to a short-lived card collecting phase. By the time the 1993 season was over, I was copying statistics from the internet (yes, I was online way back then!) and simulating seasons on a hilariously cheesy (by modern standards) baseball platofrm for my Super Nintendo (LOL). By the time the Mariners were celebrating in a delerious mass in October of ’95, I had my first baseball encyclopedia and was reading voraciously about the history of baseball. By the time I headed off to college (in 2000), I was building really simple player evaluation methods (I laugh at the first analysis I ran now when I look back on it).
This is now my seventh season as a sabermetrician, each season I learn a little more and each season I get a little further from accomplishing my goals (the more you learn about baseball, the more you realize you don’t know anything)! After that long premable (sorry, I tend to be a bit wordy when making introductions), I want to talk about my approach to problem solving as it applies to baseball and about what I view as some serious deficiencies with some other approaches.
The Scientific Method
The study of baseball is not unlike the study of meteorology (my second love – I am finally close to earning a degree in Meteorology and moving on to grad school) in that both require a scientific approach and deal with principles of chaos and random selection. I’ve used Monte Carlo simulations, for example, in the analysis of wind speed fluctuations over time AND to generate Win Probability Added statistics for baseball games in the play by play era.
If someone describes himself as a Sabermetrician and that someone does not have a primary career path linked to the sciences, you should immediately have doubts about his or her ability to think in the manner of a scientist and therefore about his relevence. There is exactly one exception to this point, and his name is Bill James. He only stands out as an exception because he has an unusual gift for intuition regarding baseball that leads him to scientifically incorrect methods that produce bizarrely accurate results.
The scientific approach starts with an observation. Which means you can’t be a sabermetrician if you don’t spend a LOT of time looking at the data. It’s a really fascinating history if you take the time to study it and look for patterns, but a lot of people are tragically short on patience and skip this step, choosing instead to focus only on the most recent seasons and the data that’s right in front of them. Doing this can lead to erroneous conclusions (the most famous of which was the initially bold statement by Voros McCracken that pitchers had ZERO control over batted balls, which he has since realized was fatally inaccurate).
When I come up with an idea about how to answer a question that’s been nagging me, the first thing I always do is define specifically what my problem is.
Do baserunning habits change based on the run scoring environment?
Then I propose an answer.
While the frequency of positive and negative events changes, baserunning aggression does not change because an equillibrium has been found through years of professional play that balances risk and reward and the overall abilities of baseball players have not changed much over the years.
Then I define exactly what kind of data I need to examine to test my hypothesis.
In this case, a simple test was devised using play by play era information about the likelihood that a lone runner on first (with no one out) would advance to third on a single. The simplicity of the starting base/out state eliminates the interactions between multiple runners who might interfere with each other, and removes the number of outs, which is another variable that can impact baserunning, allowing me to focus on aggression and only aggression.
Then I examine the data and make a decision about my first guess.
This small study revealed that I was incorrect about baserunning aggression being independent of run scoring environments. In fact, there is an abnormally strong correlation (roughly 0.94!) between the rato of RS/G to the alltime average RS/G and the ratio of the odds that a runner will take third base on a single to the all time average odds of that result. This suggests that baserunners take more risks when a single run might decide a game. Interestingly, however, the odds of being thrown out attempting third base were only slightly higher in lower run scoring environments, which suggests that the pressure of a low run-scoring environment forces the development of players who are good baserunners and improves that aspect of the game.
Correlations, Confidence, and Lines of Best Fit
Since sabermetrics is inherently a statistical pursuit, almost everything done in the field tends to come down to statistical tests of significance and statistical methods for defining empirical relationships in the data. I believe there is a severe over-reliance on correlation in baseball today. A rather frustrating (to me) example of correlation run amok is this new concept that the rate at which pitchers allow HR is defined entirely by the number of outfield flies they surrender. We didn’t learn our lesson from the past overconfidence of McCracken? Don’t get me wrong. I’m a strong believer in the proper utilization of DIPS theory to the analysis of both pitchers and team defenses. My own methods rely heavily on DIPS to separate pitching from fielding. It no more accurate, however, to claim that pitchers have no control over their HR/F rate than it is to say that they have no control over their BABIP.
The reason things like this frustrate me is simple: rather than actually using the data they have, many sabermetricians continue to fall into the trap of believing that baseball analysis can be simplified down to a series of correlations that will explain everything. The problem gets compounded when they stick to their guns in the face of examples that disprove theories, or in the face of straight logic. How can it be true that all outfield flies are essentially created equally? Do we really expect that there aren’t some pitchers with the ability to induce poor contact (that results in a series of easy fly balls), and some pitchers who get hammered whenever they leave a ball up enough to be hit in the air?
I said above that a lot of time must be given to studying the data, but you’ve got to stop and think about what you’re saying when you make assumptions or strongly worded conclusions based on that data. I’ve been guilty all too many times of defending bad ideas in my youth, a practice I’ve worked hard to curtail. I only wish I’d started thinking like a scientist sooner – I might be further along now if I had.
Creating Metrics
Do you know how many times I find someone noodling with the data and trying to come up with a statistic withuot the foggiest idea what it is they’re trying to measure? The easiest way to improve someone else’s metric is to ask them “What are you trying to show here?” Most of the time, home cooked statistics have no direct connection with a real world event, and no statistic can or should be used unless (a) you are actually measuring something real and tangible and (b) you know exactly what that something tangible is!
OPS is evil. It measures nothing at all. It’s junk science that happens to kinda-sorta give the right general impression while being easy to calculate. The mixed denominators and incorrect proportions in OPS gnaw at me like nothing else. ERA is not a good measure of pitching skill (too many things embedded within that are not under the control of the pitcher) but it IS a good statistic as long as you know what it means. The original DIPS ERA is a bad statistic. It measures nothing. It’s just the sum of three componants of a correlation and bears no resemblence to real world data.
The biggest offender is the traditional park factor. A little unit analysis reveals immediately one of the biggest problems with park factors. They are unitless! A park factor is the ratio between the runs a team and its’ opponents score at home and the runs they score on the road. In otherwords it’s (R / R). That makes no sense whatsoever. Ballparks do not have a proportional (scalar) impact on run scoring. If a league scored runs at double the rate they are scored today, do we really think Coors Field would be twice as bad (relative to the league) for pitchers? A park factor is attempting to measure the park’s real influence on scoring. The more you play at any given park, the more it will influence you. A proper analysis of parks should begin not with a ratio factor but an additive factor with units like runs per game.
Stop and think before you run around mixing numbers together and rating players. What are you trying to measure and how does your metric accomplish said measurement?
I’ll save further comments for later articles, but hopefully this will give you a sense for what to expect from me in the future.

Advertisements

20 Responses to General Theories on the Study of Baseball

  1. Matt Souders says:

    The park factor thing is a MAJOR hot button with me…multiplicative factors given you very incorrect impressions when looking at smaller samples than the teams themselves (for example…individual players). Does Coors Field (in its’ heyday) really inflate the expectations for Todd Helton by 20%? Of course not. It inflates the expectations by X number of runs per out (the measure of time in baseball).
    My other big issue with park factors is that they derrive from too little information. How strong was the offensive competition faced by a team in each of the parks they visited? Did their line-up take advantage of specific features in their home park to distort the RS statistics?
    One of the major things I’ve done in my research is develop a linear system of equations that gives a much better idea of how contexts (leagues, parks, team abilities…) overlap and intermingle to explain run scoring, which I’ll go more into in later articles. That requires knowledge of Matrix theory, so I didn’t want to start down that road this early.

  2. John Beamer says:

    Matt
    Good to see you blogging at MVN. I look forward to checking back in and reading your stuff on a regular basis.
    Quick point on your OPS is evil mantra. I don’t disagree with the points that you make — the mixing of denominators and the fact that it undervalues the walk. But I think you are being slightly harsh in your conclusion. OPS is the best “simple” metric to assess offensive production out there. It is easy to calculate from any scoresheet in your head. It also correlates to run scoring as well as almost any other metric — to a very good approximation at least.
    I think we should be encouarging others to use both OBP and SLG and also OPS for that matter. I understand you scientific grounding argument, and it is valid … but if it works, it works, and it should be used when utmost precision isn’t necessarily required …

  3. Matt Souders says:

    *sigh*
    “If it works, it works!” The excuse by millions of people nationwide to not attempt to learn anything new. I understand that most people don’t find statistics as interesting as I do (I’m definitely a geek, I’m cool with that), but it wears on me when baseball fans start using OPS+ like a weapon in debate about the merits of players…LOL

  4. John Beamer says:

    Fair enough — I guess we agree to disagree on this! We shouldn’t use OPS for in-depth intricate analysis. But if we want to know the strength of two players then OPS is a useful, simple metric to use. It gives you 95% of the answer!! Any more advanced stat will give you a similar answer most of the time 🙂
    What gets to me is when people quote stats without understanding their limitations. If you understand what OPS does or doesn’t do then use it on the right occassion.

  5. Matt Souders says:

    yeah…I’m not actually disagreeing with you…I’ve just noticed a trend in baseball discussions toward using OPS and OPS+ to “decide” a debate because they are easy to find at B-R.com and whatnot. Nonetheless, your point is taken. 🙂

  6. John Beamer says:

    As you say the danger of more information is greater misuse! It is our job to try to stop it!!!
    Good luck.
    ps. will you be posting about your PCA system over here?

  7. Matt Souders says:

    Hey John…yes, I will talk some about the old PCA and a great deal about my methods for improving PCA with PBP information and the like. I thuoght it best to start general and get into the nitty gritty with time. 🙂

  8. John Beamer says:

    Well, as I said, I look forward to it.

  9. Sean Smith says:

    Additive vs Multiplicative factors –
    A sense the beginning of a great debate, and one where I don’t think a lot of ground has been covered. I’ve got some ideas on where to start with this, and I’ll flesh them out once I get the NL projections up.
    Short answer – I think there is place for both. Also – the question of to add or to multiply is as (maybe more) important to MLE’s as it is to Park Factors.

  10. Pizza Cutter says:

    My wife is a cancer researcher. She laughs every time I try to tell her that I’m a “scientist.”

  11. John Beamer says:

    Matt
    Either here or in a later post I’d be interested to hear you beef about HR/OF relationship ….

  12. Matt Souders says:

    Long story short on that…just as there are pitchers who defy DIPS to some extent…I have foud evidence that there are some pitchers who defy the HR/F theory. Barry Zito for example tends to induce a lot of weak flyballs which prevents his extreme flyballishness from resulting in high HR counts. Pedro Martinez’ HR/F was frequently down around 8%…not the typical 11. And on the other end…there are groundball pitchers who give up consistently more HR than their flyball counts project because if it’s up enuogh to get hit in the air…it’s a bad pitch (Felix Hernandez, for example).

  13. John Beamer says:

    Well I look forward to reading more from you on that for sure …

  14. Sean Smith says:

    I wasn’t aware that anyone claims HR%/FB is not an ability. There’s an article on batted balls in this year’s Hardball Times, and the y-t-y correlation for that is stronger (0.17) than anything we ever saw for DIPS.
    On Felix, I think its too early to say he has any consistent tendencies in this area.

  15. Matt Souders says:

    Sean….you evidently haven’t seen Felix pitch…it’s still happening this spring (the high HR/F thing)…he’s too aggressive and catching too much of the plate, and the result is a lot of HARD ground balls when the pitch is down and a lot of 450 foot moon shots when it’s not.

  16. Matt Souders says:

    And as to whether people claim HR/F is not a skill…if they don’t then xFIP is a complete waste of everyone’s time, because one of the assumptions of xFIP is that HR/F is not a skill.

  17. Sean Smith says:

    Felix is pitching in the Cactus league. Every fly ball there goes for a homer.
    Maybe he allowed too much hard contact last year but sure didn’t in his 2005 debut. I’d have to see it continue before I think its a real part of his skillset.

  18. Matt Souders says:

    In 2005, he was not booked yet. In 2006, the American League adjusted to Felix by sitting dead red in every count at all times and just waiting for that fastball…he throws it right over the plate a LOT and it gets WHACKED a LOT (though generally, the whacking is on the ground because he’s good at keeping the ball down). When he starts pitching more with the curve and change-up and even slider, the fastball sitting and hard contact will stop.

  19. Matt Souders says:

    Hey John…nice link. It would seem that HR/F, similar to BABIP, is a skill in that it can be influenced in very small ways (2-4 percentage point variations for HR/F)…I see the HR/F thing as being the modern version of the BABIP arguments, and this goes right to one of my major philosophies about baseball analysis…
    Document what you see…not what you think the trend is. If you want to know what impact Jarrod Washburn has on BABIP, check his BABIP relative to the rest of the Mariners. If you want to know irf Zito is longball prone, check his HR rates NOT his flyball rates. You get the idea.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: