LoHud Guest Post

As some of you may know, I’m a big Yankee fan, despite Pizza’s objections. As a Yankee fan, it is only natural that I’m a regular reader of the LoHud Yankees Blog, which happens to be the most widely read team-specific blog on the planet. So, for the past two years, the blog has featured guest posts from other bloggers during January to fill the time when the baseball world is debating pointless things such as that annoying Joe Torre book. Or in the case of baseball nerds, whether Jeff Francoeur can possibly be worth $12 million dollars.

Today, it was my turn to take a shot at wowing the masses with spectacular prose and sheer intelligence. The jury is still out on whether I accomplished either one. My post is about evaluating trades in a fair and objective manner. It’s nothing that regular StatSpeak readers don’t already know, but I figured I might as well throw a link up. Here it is, check it out.

Innovation Profile: Oakland Athletics

The easiest yet perhaps the most interesting profile to make, I’m starting with them for simplicity’s sake.

“How’d [Oakland] do it?  What was their secret?  How did the second poorest team in baseball, opposing ever greater mountains of cash, stand even the faintest chance of success, much less the ability to win more regular season games than all but one of the other twenty-nine teams?” (Moneyball: The Art of Winning an Unfair Game, Lewis, pg XII)

With these questions, Michael Lewis attempted to discover what exactly allowed one of the poorest franchises in the sport become one of the most successful ones.  Certainly it was an intriguing question; the team was not stocked with the typical great players yet still managed to win 102 games without flashy numbers.  Lewis, in his preface, notes that “in any ordinary industry would have long since acquired most other baseball teams, and built an empire” (Moneyball, Lewis, pg XIII).  Empire, monopoly, or dynasty.  Keywords that I started to discuss last time.  Most of you probably know the answers to the questions Lewis posed as we write to a well baseball-educated audience.

Here’s a thought: Michael Lewis has a Masters of Economics.

Competitive Advantage

We harken back to the principles of Creative Destruction.  Billy Beane, GM of the Oakland Athletics, created a monopoly out of a small payroll in a tough division.  It’s easy to suggest that to create this monopoly, Beane had to have come up with an innovation.  In fact, he came up with several.

Detracters of the book Moneyball claim that it glorifies on-base percentage over all else and it’s certainly easy to see why.  Beane’s use of OBP brought it into the mainstream school of thought.  But the book is not about OBP, it’s about developing new measures to determine player worth in order to find underrated players.  So Beane innovated by using more advanced player evaluations.  He knew that there was no advantage in signing players that the market knew were good so he avoided doing so.

He also used player evaluation to discover that most players play to their greatest potential when they are aged 27-30.  Again this is hardly surprising, but he concluded that there was less worth in paying free agents with declining value.  He was able to determine the time when his assets were at their maximum worth and uses this to his advantage when trading.  Buying low and selling high is the basic rule of thumb when trading anything and Beane follows this to a T.

Beane’s A’s were the first to change their drafting dogma to the idea that past statistical performance can predict future statistical performance.  This allowed them to have better handles on what to expect out of their players.  Also in the draft was the idea that college players had a higher rate of becoming major league players and were not, in fact, less likely to become superstars.  The fact that college players often signed for less certainly did not hurt.

Ignorance Profile

Core to a systematic approach to innovation is the concept of building an ignorance profile.  When you are attempting to do something has never done before it is vitally important to discover why no one has done this before.  It might be due to some market wisdom that is incorrect or it might be due to a fact that the potential innovator has overlooked.  A lot of people get caught by not preparing something on this scale.  I am not suggesting that Billy Beane created an ignorance profile, but this is what it might have looked like.

Why did teams overlook the value of sabermetric evaluation?  Certainly the technology for the kind of sabermetrics we engage in now was not readily available until the late eighties but on-base percentages are not too difficult to calculate by hand.  Although few general managers are former players the wisdom of the players pervades the baseball school of thought.  Walks were not sexy while hits were.  For a long time people considered taking a walk from a plate appearence was a failure due to not getting a hit.  But a huge part of this was due to the prevailing wisdom that walks were entirely under the control of the pitcher and that the hitter was not a factor.  This is clearly and demonstrably false but dictated management of the game for a long period of time.

Another large problem with the market is that scouts – who used a kind of subjective analysis that Beane recognized as problematic – would only scout players whom other scouts scouted (did I say scout enough in that sentence?).  But if this is the case, then were is the advantage in scouting at all?  If they only discovered players that all other teams discovered then it gave teams no competitive advantage.  So the system of scouting was completely flawed from a economical point of view.

So certainly Beane could be sure that it was the market which was ignorant and not he, so he had a valid innovation in his hands.

Market Adjustment

As with any innovation these baseball innovation could only give the A’s an advantage for a short length of time.  On-base percentage has now reached the mainstream and even better tools of evaluation have been created.  There is little competitive advantage to evaluating players based on on-base percentage because most teams do this now.  Consider the Boston Red Sox under Theo Epstein, who have a large budget while applying Beane’s small budget principles to great effect.

As much as the book Moneyball helped bring Beane’s brilliance into the limelight it also revealed to the market the source of Beane’s success.  The success of an innovation is when it is secret so that the innovator can continually use this to his advantage.  What Lewis did was expose Beane’s innovation in a very easy too read sort of way.  That’s not to suggest the market would not adjust anyways but it could not have helped.

A very curious factor is the very idea of blogging.  Most bloggers are hobbyists who are not paid and are under no obligation to help certain teams, yet some brilliant analysises are created in this medium, just look back into the archives of this very website.  Bloggers are trending towards better and better players evaluations than I suspect teams are capable of.  But it’s all available for free (or for a pittance) over the Internet and so can only be a competitive advantage for a team insofar as other teams are ignorant of the particular analysis.  So, by writing a sabermetric blog, we’re morphing the market towards one where sabermetrics give no competitive advantage.

Into the Future

Sure, Beane’s initial innovations give less of an advantage to him now, but I think that Beane has demonstrated the traits of an innovator.  He has certainly shown himself as a superior trader and salesman who knows exactly when his assets are at their maximum value (see: Zito, Barry).  In time he will create a new innovation to destroy the monopolies created by older ones and soon be the top of the game again.  The competitive pressures demand it of him.

Some rockin' links

From the “stuff we’re reading” file: Derek Carty takes a look at predicting BABIP for batters.  He finds that a regression-based formula put together by Peter Bendix and Chris Dutton rates the best of them all.  This is a good next step.  There are multiple techniques out there for this (and for estimating a bunch of other things), so now we need to start looking at which is the best.  If there’s one place for a follow up, it would be making sure that the errors aren’t systematically distributed (does the formula over/under-shoot in a consistent manner for certain types of players), but one thing at a time I suppose.

While you’re at THT: Chris Jaffe has a list of 50 “closer” entry songs waiting to happen.  I suppose I know what mine would be.

Baseball game Hall of Fame?: Over at Beyond the Boxscore, there’s a delightful discussion of what games belong in the Hall of Fame.  I vote for this one… because I was there…

Some rockin’ links

From the “stuff we’re reading” file: Derek Carty takes a look at predicting BABIP for batters.  He finds that a regression-based formula put together by Peter Bendix and Chris Dutton rates the best of them all.  This is a good next step.  There are multiple techniques out there for this (and for estimating a bunch of other things), so now we need to start looking at which is the best.  If there’s one place for a follow up, it would be making sure that the errors aren’t systematically distributed (does the formula over/under-shoot in a consistent manner for certain types of players), but one thing at a time I suppose.

While you’re at THT: Chris Jaffe has a list of 50 “closer” entry songs waiting to happen.  I suppose I know what mine would be.

Baseball game Hall of Fame?: Over at Beyond the Boxscore, there’s a delightful discussion of what games belong in the Hall of Fame.  I vote for this one… because I was there…

So how long does it take for BABIP to become reliable?

Seems a simple question.  We know that BABIP (batting average on balls in play) for pitchers has a low correlation from year to year.  As a result, a Sabermetric standard has been that one year in a pitcher’s life tells you little about his actual ability to prevent hits on balls in play, which is true.  In statistical terms: one year is not a sufficient sample to get a good estimate of the parameter, primarily because a pitcher only faces a few hundred balls in play each year.

Suppose though that a pitcher’s season lasted billions of plate appearances.  Eventually, we’d know exactly how good a pitcher was.  If we let him face another billion hitters, he’d come up with the same number again.  That sort of sampling frame produces reliable statistics, but it’s a fantasy.  We have to deal in reality.

But after looking at year-to-year stats, with the low correlation between BABIP at year 1 and BABIP at year 2 (which has held any which way you try to break it), it’s been assumed that pitchers have no control at all over their BABIP, ever.  That’s a big jump, one that I think people make without fully stopping to realize that they’ve made.  (I’ve probably made it myself.)  There’s a difference between a parameter being entirely random and it being unobservable given our limited data and the amount of noise present. 

The assumption goes that everyone is a .300 pitcher once the ball is in play and doesn’t leave the stadium.  After all, if there’s no stability, it must all be random noise.  Right?  It’s just that no one has ever been really comfortable with that thought.  Pitchers don’t differ in their BABIP ability at all?  Pedro Martinez in his heyday was the equivalent of Mike Bacsik in his heyday?  It just doesn’t make sense.  Then there is the curious case of Troy Percival (my personal favorite piece of anecdotal evidence.)  His BABIPs have been consistently below the magic .300 line throughout his entire career, and it’s been a long one.  Could it happen by chance?  Sure, but perhaps something else is afoot.

Maybe the problem is that we need to widen the sampling frame.  Maybe one year doesn’t tell us much about a pitcher’s true talent on BABIP, but what if several years do? 

I took 30 years worth of Retrosheet data (1979-2008) and dumped it into a giant file.  I selected all balls in play (not a strikeout, not a walk, not a home run, not HBP, not one of those weird catcher interference thingies.)  As I have been wont to do lately, I started running some split-half reliability analyses.  I split each pitcher’s batters faced into even and odd numbered appearances (so, I’m drawing the first PA into the odd group, then the second into the even group… it balances out the two halves of a player’s performance so that I’m drawing some from year one, some from year two, etc.)

For each pitcher, I started by taking a sample of 500 balls in play and splitting them into two 250 BIP halves (those that had 500 to give).  I ran a correlation between those two halves for all 1461 pitchers in the sample who fit the criteria.  The correlation was .174.  So, at 250 BIP, BABIP has a split half reliability of .174.  It’s numbers like that which led to the creation of DIPS theory to begin with.

But let’s expand.  Let’s take two samples of 500 BIP.  That bumps things up to .253.  Hmmmm, getting a bit more reliable.  The question becomes when does it hit that “good enough” point.  I’ve argued previously for the use of .70 as the cutoff for reliability. It’s an arbitrary point (I guess in an ideal world, we’d want a reliability of 1.0), but .707 has an R-squared of .50, which means anything north of that accounts for more than 50% of the variance.  Can we get to .70?

Turns out that the answer is… yes.

At a sample of 3750 balls in play, (a 7500 BIP sample, chopped in half… there were 48 pitchers in the last 30 years who had that many BIP to look at… not outstanding, but enough to not discount), the split-half reliability was .696.  At 4000, it reached .742 (in 34 pitchers).  So, it only takes about 3800 BIP before we get a reliable read on a pitcher’s BABIP abilities.  That’s a lot, but it’s not an obscene amount.  In 2008, the average pitcher saw roughly 3 balls in play per inning pitched.  At that rate, a starter who throws 180 innings would see about 540 BIP in a year (rough estimates here.)  So, it would take about seven years, at that same 180 IP per year rate, to get to the required number of BIP.  Not easy, but not out of the realm of possibilities.

Now, about those guys who had two matching 4000 BIP samples, there was still some variability in the sample.  Andy Pettite had BABIPs in his twin samples of .318 and .312.  Charlie Hough had the other extreme at .248 and .266.  So, it looks like there is such a thing as the “ability” to exert some control over what happens to a ball in play.  It just takes a while (but not forever) to reveal itself.

This isn’t a very functionally useful finding for evaluating players or predicting what they will do.  A pitcher is not the same man he was at the beginning and end of seven years (either as a pitcher or a human being).  The ability to prevent hits on BIPs may deteriorate over the years and at that point, we’re using data that are 6 and 7 years old to predict what will happen tomorrow.  In a single season, which is really the sampling frame that most fans are concerned about, there will still be a lot of noise around the signal, but the signal is definitely there.  Now if we can just get a better radio to pick it up.

Those Left Behind

I want to take a few minutes here to look at some of the remaining members of the 2009 free agent class. We’ve been reading for weeks about the depressed economy having an effect on some of the potential buyers this winter, and how some players aren’t happy about it. Bobby Abreu is still unsigned as of this writing, as are Adam Dunn and Manny Ramirez. Those three combined for 97 home runs last season, though they admittedly are not the three best fielders in the world. Abreu, Dunn and Manny are three of the most consistent players in baseball, so it’s easy to figure out what they will each add to a team. Obviously, teams so far have figured out that value and don’t like the asking price so far.

So what about some of the more volatile members of the current free agent class? I’m not talking about the Milton Bradley throwing a fit kind of volatility here. I mean the guys who are more difficult to value because of their potential to be productive, albeit less so than in the past. This is by no means an all-encompassing list, it’s more of just a survey of the field. For right now, I’m going to stick with hitters. Let’s get to it…

Ivan Rodriguez

Depending on who’s doing the talking, an Ivan Rodriguez signing could be described as either a bargain or a rip-off. Despite having an OBP below .300 in two of the past four years, replacement level is so low for catchers that Pudge has remained a very productive member of society. Let’s say he signs for a one year, $5 million deal. If he reverts back to his 2007 form, or even his pre-Yankees 2008 form, he’ll be a steal. But if the real Ivan Rodriguez right now is the one who got benched in favor of Francisco Cervelli in September, then his new team just threw that money down the drain. My money is on him being a good buy.

Nomar Garciaparra

Nomar is an interesting case, in that even when he’s healthy you’re not really sure if he’s healthy. He won comeback player of the year in 2006 despite missing 40 games due to injury. Just one season later, in essentially the same amount of playing time, he saw his home run totals drop by 65%. His slugging went from Burrell-esque levels to below the Melky-line. He’s still probably a decent hitter, but not for a first baseman. If some team is in need of a part-time utility infielder and is willing to put up with his limited range, then I can justify acquiring him. Other than that, I’d say he’s not worth keeping on the roster.

Jim Edmonds

I suspect that when he’s up for the Hall of Fame, there will be a lot written about his career in particular, and whether he deserves to be in. Just a wild guess–he gets in the third time around. Side note: Why is it that fielding is rarely considered for awards, but comes up so often in HOF debates? Anyways, Edmonds surprised everybody last season with his 20 home runs in limited playing time (and his .353 wOBA, but I suspect the amount of people shocked by that was far fewer). He’s probably no longer a viable center fielder, but the positional adjustment might just cancel out the value added by moving him to a corner spot. He’d have to be just about average at the corners to come out ahead. Offensively, Edmonds should continue to provide good value since he still has a great walk rate and that power bat (yay Moneyball!). If he’s used in the same way he was last season, he’ll be a good addition to some contending team.

Frank Thomas

The Big Hurt caps off this list of seemingly forgotten free agents. He held up pretty well in ’06 and ’07 but went down with a series of quad injuries that limited his 2008 season to just 71 games. I don’t see any team signing him to be the DH, so it looks like Fragile Frank (just made that up, I think) may have to hang it up. He’s not much different than guys like Mike Piazza and Sammy Sosa, who did not play last season after having varying degrees of success in 2007.

So that wraps up this post on some of the potential bargains of the remaining free agent class. Next week, I might look at some pitchers, although that’s less fun so I reserve the right to scrap the idea entirely. 

A philosophical question

Let’s say that there was a team (we’ll call them the Mapleland Bees) that was faced with a choice between two free agents.  (And no, this isn’t one of those things where I reveal that these are two real players at the end and say “surprise!”)

Free Agent A is good at the types of things that people will pay to watch. 

  • He hits homeruns
  • He’s “basestealing threat”, which means he’s good for 15-20 SB per year
  • He has a reputation for “coming through in the clutch,” dating superstars and models, and being a pretty “face of the franchise.”  The reporters love him because he’s great copy.
  • He hits for a high (.300+) batting average and because of the guys in front of him had 120 RBI last year.

The problem is that he’s not so good at some of the “hidden” things in baseball that few people know to even look for and fewer would probably pay for.  He’s awful as a defender, and makes the routine plays look spectacular.  He doesn’t draw walks (and so has a low OBP), strikes out a lot, and while he steals bases, it’s primarily because he just runs a lot and gets lucky sometimes.  In other words, he’s something of a slightly altered version of Derek Jeter.  (I know, I know, there’s nothing wrong with Jeter’s OBP).

Free Agent B is not a homerun hitter, nor much with the batting average, and due to hitting behind an OBP nightmare, he had a mere 65 RBI last year.  He does put up his share of doubles, but isn’t a fun player to build a marketing campaign around (read: ugly) and is happily married to his college sweetheart, has two kids, and doesn’t say much after games.  However, he’s a fantastic defender who makes the tough plays look routine.  He also draws a lot of walks, so while his average looks low, he’s actually getting on base quite a bit.  Plus he doesn’t really strike out so much.

So, he’s what would happen if Mark Ellis and Brian McCann had a baby together.

Here’s the thing.  The Bees are an enlightened team and they see the two players for just just what they are.  And after crunching some numbers, they realize that Ellis McCann would actually stand the better chance of making the team better, once everything is considered (hitting, fielding, running, the fact that being pretty has no bearing on on-the-field results).

But wait, not all is that simple.  Jerek Deter, despite being the lesser player of the two, will sell a lot of t-shirts, tickets, and is worth a few million more in the TV contract and a lot more in getting his name mentioned on E!.  Ellis McCann is boring and will only be appreciated by a handful of nerds. After crunching some more numbers, the owner realizes that he stands to make more money by signing Jerek Deter than Ellis McCann, even after you factor in what is likely to be a big difference in salary… in Deter’s favor.  It even works out to favor more profit from Deter after you figure in the thought that a team that wins more (and Ellis McCann is worth more wins) makes more because everyone loves a winner.

So, whom should the owner of the Mapleland Bees instruct his GM to sign?  He’s a business man who’s running an entertainment venture, and if there’s nothing else that we learned from the unfortunate ten years that were the peak of Britney Spears’s career, it’s that you have to give the people what they want, no matter how stupid that is.

But… winning…


Get every new post delivered to your Inbox.