A few cultural musings on the walk

Last night, I was washing the dishes and musing on baseball, specifically about the walk.  Much has been written on the hidden beauty of the base on balls among Sabermetric types, but the poor walk can’t seem to get its due.  It doesn’t even count as an at-bat.  Originally, it was considered that, as the batter, you were a mere passive by-stander in the walk, and so you should not be credited for it… or even have that 2 minutes of your life acknowledged in the official stats.  Then came Moneyball.  If there is one critique of Moneyball that I heard over and over again, it was a semi-derisive, “Why are you guys so obsessed with walks?”  How did the walk go from the red-headed step-child of batting outcomes to an outcome over which people had philosophical discussions?

Oddly enough, I don’t think that fans are responding to specific walks.  Go to a game where your team needs a base runner.  If the leadoff guy draws a walk, those in attendance will cheer, because… well, he just did something good.  There’s no denying that a walk is a positive outcome for the batter.  Why the hate for those who are particularly good at drawing them?

It occurred to me that we are actually bred from our earliest baseball days to eschew the walk.  At first, you probably played backyard baseball where no one kept the count.  You might have had rules about striking out, but no one ever walked.  The point was to smack the ball, run around, and pretend that you were one of the local heroes.  Then in Little League or rec center ball or whatever, there was finally an umpire to keep the count, but you also only had the field for an hour or so before the next group of 9-year-olds and their parents came by.  Your coach told you to swing, partly because the idea was to teach you hand-eye coordination, but also because swinging moved the game along.  A walk takes at least four pitches (usually more).  Chances are that on one of your three swings, you’d at least do something useful.  Actually aiming for a walk was something that was kinda selfish and probably subtly, if not openly discouraged.  You’re taking away valuable game time.

Then there’s the mandatory male training in American culture that it’s not OK to wait for something to come to you.  Swinging and missing is a bummer, but there’s a certain honor to having tried.  At least you went out and did something.  Think for a moment though.  Rec league pitchers, if you really called the strike zone, aren’t all that accurate.  There are probably some pitchers against whom a team could probably pile up runs by simply standing there and waiting for ball four.  It would probably work, but after a few innings, the ridicule from the stands for using this unorthodox strategy would be unbearable.  Sports are supposed to be won with brute force, not brain cells.

So, is it any wonder that by the time a baseball fan comes of age, he’s pre-disposed against the walk?  Maybe that’s why it’s still considered something of a shameful outcome.  We’ve been told our whole baseball lives to have a “good eye”, but in the actual playing of the game, it is played in ways that either de-emphasize the walk or subtly dispairage it.

Checking the Leaderboards

I always like these early-season small-sample-size-be-damned articles about the early oddities in baseball. Kind of like an early preview of The Oddibe Awards from StatSpeak alum Eric Seidman. A few days ago on THT, Craig Brown took a fun look at some of the odd things going on in the baseball world. Here’s my favorite one:

Speaking of strand rate, Brett Myers is stranding 92 percent of all base runners, yet owns a 5.03 ERA.


That takes some effort.


Myers has thrown 19.2 innings and has allowed only 18 hits and six
walks. That’s not bad at all (fantasy players will recognize a 1.22
WHIP), but here’s the problem: Of his 18 hits, seven have left the
yard. With a career home run rate of 1.3 HR/9, Myers has always been
prone to the the long ball, but seven home runs in just under 20
innings of work gives him a rate of 3.2 HR/9.


His strand rate is low because, of the 10 runs he’s allowed, nine have
been the direct result of a home run. Five solo home runs and two
two-run home runs have accounted for the damage. He’s also allowed six
doubles. Pretty simple math tells us that means he’s surrendered only
five singles. Myers has always allowed extra base hits by the
bucketful, but this is kind of crazy.


What’s troubling is that it could be worse: Myers’ FIP is at 6.84. If
four more runners were on base for any of his home runs, his ERA would
be outpacing his FIP. He’s walking a fine line between disaster and
epic disaster.

That should make Phillies fans real happy.

What Could A-Rod's Future Hold?

I was searching for something to write about Alex Rodriguez to open this article. I promise, I tried really hard to find something good. But when you think about it, what about A-Rod’s life do we not already know? There’s nothing new and interesting to write about with him, unless you really enjoy reading about Madonna, choking, or steroids from 6 years ago. Newspapers will write literally anything about him, regardless of its relevance. According to Roto Authority, he bleached his hair today for the second time this month*. But in my tireless search for new and exciting information about A-Rod, I found this tidbit, which I think is something that I guarantee nobody here knew: His other nickname is…wait for it… wait for it…. The Cooler (?!). Apparently this is because he makes his teams win less often, which contradicts everything I know about baseball. It’s also the worst nickname I’ve ever heard.

*Note: not actually true, but it wouldn’t surprise me if this was actually reported.

What does this have to do with the future of Alex Rodriguez? Well, nothing so far–but I’m getting there. So what I’m doing here today is looking through A-Rod’s top comparable players through age 32, as defined by Baseball Reference. I don’t really care how many home runs he’s going to hit, or what his batting average will be. We should only really care about overall production, and the best way to measure that is with Sean Smith’s historical Wins Above Replacement database.

If you look at A-Rod’s comps, you’ll notice something interesting. Six of the ten players listed started their careers between 1951 and 1956. Retrosheet, the foundation of Sean’s database, starts its records in 1953, right in the middle of that range. Another A-Rod conspiracy? You be the judge.

Those six  players (plus one more from more recent times) are what I’ll be basing this on, so let’s meet them. The list is quite impressive:

Hank Aaron, Ken Griffey Jr., Mickey Mantle, Frank Robinson, Eddie Matthews, Willie Mays, and Al Kaline.

All of those, if you didn’t already know, are either current or future hall-of-famers. So it would seem that A-Rod is in good company, and we have nothing to worry about, right? Maybe. The first thing to note is that Alex is signed through his age 41 season, essentially ensuring that he won’t be retiring before then. PECOTA or CHONE will do a much better job of telling you how A-Rod will do the next five years than I can (CHONE covers age 38, but I didn’t realize that at the time of writing this). But when those five years are over, he’ll still have another four years left on his deal. Those age 38 to 41 seasons is what I’m going to try and look at.
Read more of this post

What Could A-Rod’s Future Hold?

I was searching for something to write about Alex Rodriguez to open this article. I promise, I tried really hard to find something good. But when you think about it, what about A-Rod’s life do we not already know? There’s nothing new and interesting to write about with him, unless you really enjoy reading about Madonna, choking, or steroids from 6 years ago. Newspapers will write literally anything about him, regardless of its relevance. According to Roto Authority, he bleached his hair today for the second time this month*. But in my tireless search for new and exciting information about A-Rod, I found this tidbit, which I think is something that I guarantee nobody here knew: His other nickname is…wait for it… wait for it…. The Cooler (?!). Apparently this is because he makes his teams win less often, which contradicts everything I know about baseball. It’s also the worst nickname I’ve ever heard.

*Note: not actually true, but it wouldn’t surprise me if this was actually reported.

What does this have to do with the future of Alex Rodriguez? Well, nothing so far–but I’m getting there. So what I’m doing here today is looking through A-Rod’s top comparable players through age 32, as defined by Baseball Reference. I don’t really care how many home runs he’s going to hit, or what his batting average will be. We should only really care about overall production, and the best way to measure that is with Sean Smith’s historical Wins Above Replacement database.

If you look at A-Rod’s comps, you’ll notice something interesting. Six of the ten players listed started their careers between 1951 and 1956. Retrosheet, the foundation of Sean’s database, starts its records in 1953, right in the middle of that range. Another A-Rod conspiracy? You be the judge.

Those six  players (plus one more from more recent times) are what I’ll be basing this on, so let’s meet them. The list is quite impressive:

Hank Aaron, Ken Griffey Jr., Mickey Mantle, Frank Robinson, Eddie Matthews, Willie Mays, and Al Kaline.

All of those, if you didn’t already know, are either current or future hall-of-famers. So it would seem that A-Rod is in good company, and we have nothing to worry about, right? Maybe. The first thing to note is that Alex is signed through his age 41 season, essentially ensuring that he won’t be retiring before then. PECOTA or CHONE will do a much better job of telling you how A-Rod will do the next five years than I can (CHONE covers age 38, but I didn’t realize that at the time of writing this). But when those five years are over, he’ll still have another four years left on his deal. Those age 38 to 41 seasons is what I’m going to try and look at.
Read more of this post

Sabermetrics… without numbers

It seems like a contradiction in terms.  How can someone do Sabermetrics without numbers?  After all, are we not the great digitizers of the game of baseball?  Can one do data analysis without… data?  The answer is that yes, Sabermetrics is possible without numbers.  It just takes a little different understanding of what data are and what they can be.

Most of Sabermetrics, thus far, has focused on a basic model of counting up the frequency of some event happening (perhaps norming it to some league expectation) and either counting it for its own sake or checking to see if it correlates with some other event which we have diligently counted up.  I have nothing against this approach (in fact, I use it a lot!), so long as data are systematically collected in an un-biased (i.e. scientific) manner.  But counting things up and turning everything into numbers isn’t the only way to go about research.

Consider how you make a decision on whether to go to a new restaurant.  Let’s assume that you have a group of fairly open-minded eaters.  My guess is that you don’t have a metric that takes the average iterm price on the menu, the distance to the restaurant from your mom’s basement, the cubed root of the waiter’s salary, etc.  Instead, you harken back to hearing your friend Larry say that he went to the restaurant and he really liked it.  In other words, you got a scouting report.  Larry’s scouting report might not have been numerical, but since you trust Larry’s judgment on such things, you suggest the new place.  It may end up as the best meal you’ve ever had.  It may be awful.  You’re about to find out… based on a sample size of one and a poorly defined idea of what “good” is.  He might give you a “8 out of 10″ (it’s a number!) type of rating, but how did he come up with 8?  Is that a good measure of how much you’ll actually like the restaurant?

(Would you like a slightly more cynical example?  Remember that paper that you wrote back in college that got a C-?  It ruined your semester and your GPA and you’ve hated that professor ever since.  I’m sure I’m that professor for someone out there.  How did s/he come up with a C-?  We use qualitative judgments all the time… sometimes, like in the school example, with consequences that can make or break someone’s entire life-course!)

In reality, we operate on non-numeric data a lot in life, especially when dealing with the completely unknown.  It’s hard to collect systematic quantitative data about everything!  Most data of this sort comes in the form of descriptions and words… not regression equations.  The thing that most people don’t know is that this sort of data, called qualitative data, can be analyzed.  (A phonetic problem quickly arises… the word “qualitative” looks and sounds a lot like “quantitative.”  Therefore, for the rest of the article, I will use “qual” and “quant” as short-hand.)  Qual data, when systematically collected and analyzed (and yes, there are ways to do that) can lead to some very interesting results.

A common type of qual data analysis is called content analysis.  Suppose you had an idea for a research question that you wanted to ask or a topic that you wanted to study.  Suppose that it was question that no one had ever really researched before.  (You’re so creative!)  You have an idea, but you have no idea where to really start.  Solution: you do some reading about whatever’s been said on the subject.  You go to the oracle of all knowledge (Google) and type in a few keywords.  You read everything that comes up on the subject.  Slowly, you begin to understand how other people have conceptualized the issue in the past.  This provides you a groundwork for studying the issue further, perhaps with quant data.  At least now you know what to count.  A while back, I was looking for how to quantify “hard-nosed” players and used a similar method.  I looked for what characteristics were most-often mentioned for players who were considered “hard-nosed” and found that mostly the term is used to describe players who like to run into things.

The other common type of qual analysis is thematic coding.  Let’s say that you have qual observations on several pitchers.  Was the person who filled out the report generally positive or negative about the pitcher’s fastball?  Did he mention anything about his mechanics?  Did this guy have the potential for a “filthy” slider?  Now, let’s wait a few years and see whether those observations predict to data that we can gather later on.  But let’s take all the scouting reports filed by that guy.  And let’s look at all the pitchers whom he rated, both the studs and the duds.  Did his predictions actually pan out?

The details of how this is done would take a much longer piece (actually, a course in qual data analysis), but in theory, it is just an engineering problem to actually conduct this type of study.  It would provide a systematic look where scouting reports are valid and where they are not.  If Sabermetrics claims to be a science, then it must allow for the possibility that these scouting reports are powerfully predictive.  (And it must allow for the fact that scouting as a profession is functionally useless.)  The fact that scouting reports are non-numerical in nature does not mean that they can’t be analyzed.  It just means that they need to be analyzed in a different way.

Testing the Projection Systems' Strengths and Weaknesses

Testing the Projection Systems’ Strengths and Weaknesses

Normal
0

false
false
false

MicrosoftInternetExplorer4

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:””;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:”Times New Roman”;
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}

There are several prominent projection systems for predicting how players will do in the coming baseball season.  Depending on the season and the test used,
each has some claim of superiority.  One
system that does not claim superiority is Marcel the Monkey, developed by Tom
Tango, which intentionally uses a simple method of adjusting for a weighted
average of historical performance, regression to the mean, and age
factors.  Tango explains that it should
be the standard that any projection system worth looking at should be able to
beat.

Each of these systems is done differently, and each is bound
to have its strengths and weaknesses. 
While many can claim that one is better than the other, it is very difficult to tell.  My suspicion is
that since different systems have different methodologies, they will each excel
at projecting different groups of hitters, and at projecting different
statistics than each other.  In this article,
I will test how each system does with many different subgroups, and we will see
that there are certain areas that each system excels over the others.  PECOTA, for instance, struggles at projecting BABIP for speedy players.  ZIPS struggles at projecting BABIP overall.  OLIVER tends to underestimate walks and strikeouts, but overestimate homeruns.  CHONE does a bit better overall, but for younger players, it appears that PECOTA does a bit better for the majority of them.  I will explain each of these results in detail later on.

I gathered projections on 526 different players who got at
least 300 plate appearances in either 2007 or 2008 who were projected by the
five projection systems I tested.  Many
players were not projected by one system or another, and these were excluded
from the sample.  Obviously, this
eliminated a lot of useful information, since the main way that projection
systems differ is how they project young players and many young players were
not projected by MARCEL, and many other players were not projected by some
other systems too.  However, there is
still useful information contained in these tests.

Read more of this post

Follow

Get every new post delivered to your Inbox.