A few cultural musings on the walk

Last night, I was washing the dishes and musing on baseball, specifically about the walk.  Much has been written on the hidden beauty of the base on balls among Sabermetric types, but the poor walk can’t seem to get its due.  It doesn’t even count as an at-bat.  Originally, it was considered that, as the batter, you were a mere passive by-stander in the walk, and so you should not be credited for it… or even have that 2 minutes of your life acknowledged in the official stats.  Then came Moneyball.  If there is one critique of Moneyball that I heard over and over again, it was a semi-derisive, “Why are you guys so obsessed with walks?”  How did the walk go from the red-headed step-child of batting outcomes to an outcome over which people had philosophical discussions?

Oddly enough, I don’t think that fans are responding to specific walks.  Go to a game where your team needs a base runner.  If the leadoff guy draws a walk, those in attendance will cheer, because… well, he just did something good.  There’s no denying that a walk is a positive outcome for the batter.  Why the hate for those who are particularly good at drawing them?

It occurred to me that we are actually bred from our earliest baseball days to eschew the walk.  At first, you probably played backyard baseball where no one kept the count.  You might have had rules about striking out, but no one ever walked.  The point was to smack the ball, run around, and pretend that you were one of the local heroes.  Then in Little League or rec center ball or whatever, there was finally an umpire to keep the count, but you also only had the field for an hour or so before the next group of 9-year-olds and their parents came by.  Your coach told you to swing, partly because the idea was to teach you hand-eye coordination, but also because swinging moved the game along.  A walk takes at least four pitches (usually more).  Chances are that on one of your three swings, you’d at least do something useful.  Actually aiming for a walk was something that was kinda selfish and probably subtly, if not openly discouraged.  You’re taking away valuable game time.

Then there’s the mandatory male training in American culture that it’s not OK to wait for something to come to you.  Swinging and missing is a bummer, but there’s a certain honor to having tried.  At least you went out and did something.  Think for a moment though.  Rec league pitchers, if you really called the strike zone, aren’t all that accurate.  There are probably some pitchers against whom a team could probably pile up runs by simply standing there and waiting for ball four.  It would probably work, but after a few innings, the ridicule from the stands for using this unorthodox strategy would be unbearable.  Sports are supposed to be won with brute force, not brain cells.

So, is it any wonder that by the time a baseball fan comes of age, he’s pre-disposed against the walk?  Maybe that’s why it’s still considered something of a shameful outcome.  We’ve been told our whole baseball lives to have a “good eye”, but in the actual playing of the game, it is played in ways that either de-emphasize the walk or subtly dispairage it.

Checking the Leaderboards

I always like these early-season small-sample-size-be-damned articles about the early oddities in baseball. Kind of like an early preview of The Oddibe Awards from StatSpeak alum Eric Seidman. A few days ago on THT, Craig Brown took a fun look at some of the odd things going on in the baseball world. Here’s my favorite one:

Speaking of strand rate, Brett Myers is stranding 92 percent of all base runners, yet owns a 5.03 ERA.


That takes some effort.


Myers has thrown 19.2 innings and has allowed only 18 hits and six
walks. That’s not bad at all (fantasy players will recognize a 1.22
WHIP), but here’s the problem: Of his 18 hits, seven have left the
yard. With a career home run rate of 1.3 HR/9, Myers has always been
prone to the the long ball, but seven home runs in just under 20
innings of work gives him a rate of 3.2 HR/9.


His strand rate is low because, of the 10 runs he’s allowed, nine have
been the direct result of a home run. Five solo home runs and two
two-run home runs have accounted for the damage. He’s also allowed six
doubles. Pretty simple math tells us that means he’s surrendered only
five singles. Myers has always allowed extra base hits by the
bucketful, but this is kind of crazy.


What’s troubling is that it could be worse: Myers’ FIP is at 6.84. If
four more runners were on base for any of his home runs, his ERA would
be outpacing his FIP. He’s walking a fine line between disaster and
epic disaster.

That should make Phillies fans real happy.

What Could A-Rod's Future Hold?

I was searching for something to write about Alex Rodriguez to open this article. I promise, I tried really hard to find something good. But when you think about it, what about A-Rod’s life do we not already know? There’s nothing new and interesting to write about with him, unless you really enjoy reading about Madonna, choking, or steroids from 6 years ago. Newspapers will write literally anything about him, regardless of its relevance. According to Roto Authority, he bleached his hair today for the second time this month*. But in my tireless search for new and exciting information about A-Rod, I found this tidbit, which I think is something that I guarantee nobody here knew: His other nickname is…wait for it… wait for it…. The Cooler (?!). Apparently this is because he makes his teams win less often, which contradicts everything I know about baseball. It’s also the worst nickname I’ve ever heard.

*Note: not actually true, but it wouldn’t surprise me if this was actually reported.

What does this have to do with the future of Alex Rodriguez? Well, nothing so far–but I’m getting there. So what I’m doing here today is looking through A-Rod’s top comparable players through age 32, as defined by Baseball Reference. I don’t really care how many home runs he’s going to hit, or what his batting average will be. We should only really care about overall production, and the best way to measure that is with Sean Smith’s historical Wins Above Replacement database.

If you look at A-Rod’s comps, you’ll notice something interesting. Six of the ten players listed started their careers between 1951 and 1956. Retrosheet, the foundation of Sean’s database, starts its records in 1953, right in the middle of that range. Another A-Rod conspiracy? You be the judge.

Those six  players (plus one more from more recent times) are what I’ll be basing this on, so let’s meet them. The list is quite impressive:

Hank Aaron, Ken Griffey Jr., Mickey Mantle, Frank Robinson, Eddie Matthews, Willie Mays, and Al Kaline.

All of those, if you didn’t already know, are either current or future hall-of-famers. So it would seem that A-Rod is in good company, and we have nothing to worry about, right? Maybe. The first thing to note is that Alex is signed through his age 41 season, essentially ensuring that he won’t be retiring before then. PECOTA or CHONE will do a much better job of telling you how A-Rod will do the next five years than I can (CHONE covers age 38, but I didn’t realize that at the time of writing this). But when those five years are over, he’ll still have another four years left on his deal. Those age 38 to 41 seasons is what I’m going to try and look at.
Read more of this post

What Could A-Rod’s Future Hold?

I was searching for something to write about Alex Rodriguez to open this article. I promise, I tried really hard to find something good. But when you think about it, what about A-Rod’s life do we not already know? There’s nothing new and interesting to write about with him, unless you really enjoy reading about Madonna, choking, or steroids from 6 years ago. Newspapers will write literally anything about him, regardless of its relevance. According to Roto Authority, he bleached his hair today for the second time this month*. But in my tireless search for new and exciting information about A-Rod, I found this tidbit, which I think is something that I guarantee nobody here knew: His other nickname is…wait for it… wait for it…. The Cooler (?!). Apparently this is because he makes his teams win less often, which contradicts everything I know about baseball. It’s also the worst nickname I’ve ever heard.

*Note: not actually true, but it wouldn’t surprise me if this was actually reported.

What does this have to do with the future of Alex Rodriguez? Well, nothing so far–but I’m getting there. So what I’m doing here today is looking through A-Rod’s top comparable players through age 32, as defined by Baseball Reference. I don’t really care how many home runs he’s going to hit, or what his batting average will be. We should only really care about overall production, and the best way to measure that is with Sean Smith’s historical Wins Above Replacement database.

If you look at A-Rod’s comps, you’ll notice something interesting. Six of the ten players listed started their careers between 1951 and 1956. Retrosheet, the foundation of Sean’s database, starts its records in 1953, right in the middle of that range. Another A-Rod conspiracy? You be the judge.

Those six  players (plus one more from more recent times) are what I’ll be basing this on, so let’s meet them. The list is quite impressive:

Hank Aaron, Ken Griffey Jr., Mickey Mantle, Frank Robinson, Eddie Matthews, Willie Mays, and Al Kaline.

All of those, if you didn’t already know, are either current or future hall-of-famers. So it would seem that A-Rod is in good company, and we have nothing to worry about, right? Maybe. The first thing to note is that Alex is signed through his age 41 season, essentially ensuring that he won’t be retiring before then. PECOTA or CHONE will do a much better job of telling you how A-Rod will do the next five years than I can (CHONE covers age 38, but I didn’t realize that at the time of writing this). But when those five years are over, he’ll still have another four years left on his deal. Those age 38 to 41 seasons is what I’m going to try and look at.
Read more of this post

Sabermetrics… without numbers

It seems like a contradiction in terms.  How can someone do Sabermetrics without numbers?  After all, are we not the great digitizers of the game of baseball?  Can one do data analysis without… data?  The answer is that yes, Sabermetrics is possible without numbers.  It just takes a little different understanding of what data are and what they can be.

Most of Sabermetrics, thus far, has focused on a basic model of counting up the frequency of some event happening (perhaps norming it to some league expectation) and either counting it for its own sake or checking to see if it correlates with some other event which we have diligently counted up.  I have nothing against this approach (in fact, I use it a lot!), so long as data are systematically collected in an un-biased (i.e. scientific) manner.  But counting things up and turning everything into numbers isn’t the only way to go about research.

Consider how you make a decision on whether to go to a new restaurant.  Let’s assume that you have a group of fairly open-minded eaters.  My guess is that you don’t have a metric that takes the average iterm price on the menu, the distance to the restaurant from your mom’s basement, the cubed root of the waiter’s salary, etc.  Instead, you harken back to hearing your friend Larry say that he went to the restaurant and he really liked it.  In other words, you got a scouting report.  Larry’s scouting report might not have been numerical, but since you trust Larry’s judgment on such things, you suggest the new place.  It may end up as the best meal you’ve ever had.  It may be awful.  You’re about to find out… based on a sample size of one and a poorly defined idea of what “good” is.  He might give you a “8 out of 10″ (it’s a number!) type of rating, but how did he come up with 8?  Is that a good measure of how much you’ll actually like the restaurant?

(Would you like a slightly more cynical example?  Remember that paper that you wrote back in college that got a C-?  It ruined your semester and your GPA and you’ve hated that professor ever since.  I’m sure I’m that professor for someone out there.  How did s/he come up with a C-?  We use qualitative judgments all the time… sometimes, like in the school example, with consequences that can make or break someone’s entire life-course!)

In reality, we operate on non-numeric data a lot in life, especially when dealing with the completely unknown.  It’s hard to collect systematic quantitative data about everything!  Most data of this sort comes in the form of descriptions and words… not regression equations.  The thing that most people don’t know is that this sort of data, called qualitative data, can be analyzed.  (A phonetic problem quickly arises… the word “qualitative” looks and sounds a lot like “quantitative.”  Therefore, for the rest of the article, I will use “qual” and “quant” as short-hand.)  Qual data, when systematically collected and analyzed (and yes, there are ways to do that) can lead to some very interesting results.

A common type of qual data analysis is called content analysis.  Suppose you had an idea for a research question that you wanted to ask or a topic that you wanted to study.  Suppose that it was question that no one had ever really researched before.  (You’re so creative!)  You have an idea, but you have no idea where to really start.  Solution: you do some reading about whatever’s been said on the subject.  You go to the oracle of all knowledge (Google) and type in a few keywords.  You read everything that comes up on the subject.  Slowly, you begin to understand how other people have conceptualized the issue in the past.  This provides you a groundwork for studying the issue further, perhaps with quant data.  At least now you know what to count.  A while back, I was looking for how to quantify “hard-nosed” players and used a similar method.  I looked for what characteristics were most-often mentioned for players who were considered “hard-nosed” and found that mostly the term is used to describe players who like to run into things.

The other common type of qual analysis is thematic coding.  Let’s say that you have qual observations on several pitchers.  Was the person who filled out the report generally positive or negative about the pitcher’s fastball?  Did he mention anything about his mechanics?  Did this guy have the potential for a “filthy” slider?  Now, let’s wait a few years and see whether those observations predict to data that we can gather later on.  But let’s take all the scouting reports filed by that guy.  And let’s look at all the pitchers whom he rated, both the studs and the duds.  Did his predictions actually pan out?

The details of how this is done would take a much longer piece (actually, a course in qual data analysis), but in theory, it is just an engineering problem to actually conduct this type of study.  It would provide a systematic look where scouting reports are valid and where they are not.  If Sabermetrics claims to be a science, then it must allow for the possibility that these scouting reports are powerfully predictive.  (And it must allow for the fact that scouting as a profession is functionally useless.)  The fact that scouting reports are non-numerical in nature does not mean that they can’t be analyzed.  It just means that they need to be analyzed in a different way.

Testing the Projection Systems' Strengths and Weaknesses

Testing the Projection Systems’ Strengths and Weaknesses

Normal
0

false
false
false

MicrosoftInternetExplorer4

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:””;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:”Times New Roman”;
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}

There are several prominent projection systems for predicting how players will do in the coming baseball season.  Depending on the season and the test used,
each has some claim of superiority.  One
system that does not claim superiority is Marcel the Monkey, developed by Tom
Tango, which intentionally uses a simple method of adjusting for a weighted
average of historical performance, regression to the mean, and age
factors.  Tango explains that it should
be the standard that any projection system worth looking at should be able to
beat.

Each of these systems is done differently, and each is bound
to have its strengths and weaknesses. 
While many can claim that one is better than the other, it is very difficult to tell.  My suspicion is
that since different systems have different methodologies, they will each excel
at projecting different groups of hitters, and at projecting different
statistics than each other.  In this article,
I will test how each system does with many different subgroups, and we will see
that there are certain areas that each system excels over the others.  PECOTA, for instance, struggles at projecting BABIP for speedy players.  ZIPS struggles at projecting BABIP overall.  OLIVER tends to underestimate walks and strikeouts, but overestimate homeruns.  CHONE does a bit better overall, but for younger players, it appears that PECOTA does a bit better for the majority of them.  I will explain each of these results in detail later on.

I gathered projections on 526 different players who got at
least 300 plate appearances in either 2007 or 2008 who were projected by the
five projection systems I tested.  Many
players were not projected by one system or another, and these were excluded
from the sample.  Obviously, this
eliminated a lot of useful information, since the main way that projection
systems differ is how they project young players and many young players were
not projected by MARCEL, and many other players were not projected by some
other systems too.  However, there is
still useful information contained in these tests.

Read more of this post

So… maybe clutch hitting exists…

A few weeks ago, I looked at whether BABIP eventually stablized enough that we could say that after X number of balls in play, the BABIP that resulted for a pitcher could be considered an accurate reflection of his actual ability over those X number of BIP.  Turns out that the number was something on the order of 3800 balls in play or so, which is roughly 7-8 years worth of 180 innings or so per year.  So, it’s not really useful in figuring out who’s good or bad at preventing hits on balls in play, but it suggests that such a skill, however overwhelmed by the noise it is, is out there.

So what about clutch hitting?  It’s been found that year to year, there is little in the way of consistency in clutch hitting performance, no matter how you measure it.  One year doesn’t tell you much of anything about a player’s clutch abilities.  But, that’s not the same thing as no skill being there.  It might just be overwhelmed by noise.  Maybe we just need a longer view.

I took the PBP data files from 1979-2008 dumped them into one file.  I then calculated win percentages based on the usual inning/out/baserunner/score framework, and then calculated leverage for each situation based on this handy dandy tutorial from Tom Tango.  (Tom’s going to yell at me for using that… he’s since moved to using a Markov model which he says — correctly — is more accurate.  But the values that this method produces are good enough for government work, and I don’t know much about how to do Markov modeling.)

Once I had win probability added values for each batting event and leverage values for each situation, I could use the WPA – WPA/LI definition for the amount of clutch that a given PA has in it that has become the standard operational definition for the term.  The rest is just a matter of split-half reliability.

As per usual, I went through each player’s plate appearances over the course of those 30 years and numbered them sequentially.  I then took matching samples (even numbered vs. odd numbered) of X number of plate appearances.  So, if I was looking for 1000 PA, what I really found were all players who had 2000 PA total, and took 1000 in each column.  I calculated how much clutch was present for each player in each of those matching samples and ran a correlation between the two.  If clutch is a repeatable skill, the correlation between the two should creep up.  In theory, when we get to a trillion PA’s, the numbers should match perfectly and the correlation will be 1.0.  Of course, no one will ever accumulate that many PA. 

number of PA N split-half
1000 869 .174
2000 429 .304
3000 186 .431
4000 74 .489
5000 20 .656

 

After 5000 PA, I couldn’t go any higher, and that sample of 20 players who have 10,000 PA to split into 2 halves with 5000 PA each (for the curious: R. Alomar, Baines, Biggio, Boggs, Bonds*, Steve Finley, Luis Gonzalez, Griffey, Gwynn, Rickey, McGriff, Molitor, Murray, Palmiero, Raines, Ripken, Sheffield, Oz. Smith, F. Thomas, Vizquel) is a very selective and very small sample.  (Oddly enough, most of these guys are career negative clutch.)  But, with that said, the split-half number is approaching .70, which I use as my own personal cutoff for “stable enough.”  (Reason: at .707, you’ve got an R-square of .50, which means that half of the variance has been accounted for within the player himself.)  A decent guess is that somewhere around 6000 PA, we have enough of a read on a player’s clutch-iness that it actually means something.  That’s about 8-10 years as a full-time starter.

So maybe we can start talking about clutch careers in Retrospect.  These aren’t the kind of data that I would want to bet on from a technical statistical perspective (methodological sample size, and selectivity issues abound!).  From a front office perspective, this is useless.  Who wants a stat that just barely makes it over the “stable” mark after 8-10 years?  But at the sports bar… well now that’s a different story.

The happiest day of the year

In life, there are four seasons.  The off-season, Spring Training, the regular season, and the post-season.  Today turns the calendar, and right now, your favorite team actually has a shot at winning the World Series (except maybe the Cubs).  It’s Opening Day, the happiest day of the year.

I’m a probability guy when I’m here.  I spend a lot of time thinking about baseball in terms of which outcome is the most likely or which is the one that would be optimal.  Every once in a great while, it’s good to stop and remember that baseball is a little bit more than just that.  As a psychologist, I’m very aware that humans are creatures who like to tell stories, and baseball is one way that, as a culture, we tell stories.  Yes, I can tell you a good bit about the most likely outcomes to happen, but then again, I once defined the Pythagorean Win Estimator as a theory which shows that despite the fact that another team is jumping on top of one another celebrating their World Series win, your favorite team was actually better.  And we can prove it.  Once in a while, it’s probably good that I take a look up from my spreadsheets and remember that this is supposed to be a little slice of life, played in increments of 90 feet.  I might know what the projected probabilities are for all the different possible outcomes, but only one of them will actually happen.

This summer, there will be a couple of cities that fall in love with their teams.  Some will have their hearts broken.  Some will remember 2009 as “the year when…”  Opening Day is like Christmas morning just before the presents are opened.  You know that what follows is going to be fun.  You just don’t know quite what’s inside yet.  And it means that once more, baseball is being played in Mudville, despite what happened last year.

Tonight, we rip off the paper to see what’s inside.  With no apologies for indulging in a little public reverrie and hopeless romanticism (and here I’m supposed to be all objective and scientific), I quote the last two words of the Star-Spangled Banner, “play ball!”

Why Haven't Sabermetrics Gone Mainstream?

In response to a previous Pizza Cutter post, Tom Tango wrote the following, which was directed at the people on the fringes of baseball analysis: “The world is big enough for all of us. Join us if you want. Just don’t stand in our way.”

While those posts are unrelated to this one, that statement got me thinking. My dad and I are both big Yankee fans, and as such we constantly talk about roster decisions, free agents, trades, etc. I’m obviously a numbers guy, and while my dad will hear my arguments, that’s still not his forte. When I say that player X will help the team a given amount, that number has virtually no meaning to him.

While the kind of people who read this blog probably constantly think about player value in real terms, it’s a topic that doesn’t seem to come up in the minds of most people. A manager will tell you that player X will help the team win and fans collectively think, “He’s probably right.” And the manager, more often than not, is correct. But when he says that the player will help the team win, how many people think to themselves, “How much?”

We are the kind of people who think “How much?” This is no great strength of ours and no great weakness of the general population. I think it is simply an attribute–whether it is positive or negative is not for me to decide.

In thinking about this issue, I decided for myself that one of the main reasons the general population thinks differently than the “sabermetric community” is fantasy baseball. Everyone has a team, some people have 3 or 4. And most of these leagues are the standard 5×5 variety where the stats of importance are RBI, runs, batting average, wins,  etc. As Patriot (LINK) will tell you, these stats have no meaningful units that can be converted to runs (don’t tell me that runs = runs, you know what I mean).

My dad plays fantasy baseball. To him, value is measured in those ten categories, because when he is “playing GM” then those are the only things that matter.  Here is my main point: Outside of increased awareness of projection systems, fantasy baseball is holding back the proliferation of sabermetrics. Why should anyone think about OBP when it has literally zero fantasy value in most leagues?

Maybe Joe Morgan and Tim McCarver are the culprits, I don’t know (Lord knows it sure ain’t this guy). Or it could be the negative sentiment towards Moneyball held by so many close to the game that’s holding us back. What do you guys think?

Edit: Further discussion can be found on FanGraphs and Baseball Think Factory

Follow

Get every new post delivered to your Inbox.