The trouble with VORP
August 23, 2008 3 Comments
I’m sure that by now you’ve heard of Value Over Replacement Player, the statistic that’s ruining enjoyment of baseball for people everywhere! You know how it goes – you’re sitting in the old ballpark, drinking an ice cold Budweiser and eating an allbeef hot dog with all the fixins, and suddenly your favorite player hits a home run – and you prepare to cheer, and something stops you, because you know somewhere that home run has been tabulated by a sabermetrician. Sad, sad times.
But how well do you really know VORP? And remember – as StatSpeak readers, all of you have probably contributed in some small way to VORP’s ruination of baseball, Mom and apple pie. Maybe you’ve referenced a player’s VORP in a team forum or told a friend about it – there you are, aiding and abetting! And VORP is getting to be very, very, very mainstream. In short, when the world judges sabermetrics, it judges it based on things like VORP. So it’s probably best to acquaint yourself with it.
In case you’ve been living under a rock, VORP is the crown jewel of the stat reports at Baseball Prospectus. Keith Woolner, a very smart guy who now works for the Cleveland Indians, came up with VORP in an effort to describe a player’s total value on offense.
Everybody talks about how complicated (and proprietary) VORP is, but really its scaldingly simple – you figure out how many runs a player is responsible for, and you subtract the number of runs a typical replacement player would have produced in the same playing time. [For the moment, we’re focusing on hitters, but there is a version of VORP for pitchers as well.]
Of course, now we just have to define how many runs a player is responsible for and what a replacement player is.
Conventionally, sabermetrics has defined a replacement player as the typical player that’s cheaply or freely available if you’re desperately in need of a player at that position – in short, we’re talking journeymen and career minor leaguers. (There is of course a lot of dissent over where to put the replacement level and whether or not a replacement baseline is correct at all. Patriot has a great writeup of these issues, which are well beyond the scope of this article.)
For the purposes of VORP, replacement level is defined as 80% of the production of an average player at that position. (85% for catchers, 75% for first basemen and designated hitters.) We’ll roll with that for now.
Now, how to figure out a player’s contribution to run scoring? Really we could use just about anything we wanted to here. And, partly for purposes of illumination and partly for a bit of circus freak appeal, I’ll use Runs Produced, otherwise known as:
Runs + Runs Batted In – Home Runs
(If you’re curious as to the justification for subtracting home runs, the short version is that you’re doublecounting runs that way; a player with a solo home run in his first atbat has 2 R+RBI, which is rather wrong, if you’re asking me. A more detailed discussion is available here.)
Baseball Reference kindly provides us with average offensive production by position for 2007. Now, let’s take a look at Alex Rodriguez. In 2007 he had 143 runs scored, 156 runs batted in and 54 home runs for a total of 245 runs produced in 708 PAs. Your average third baseman had 151 runs produced in 650 PAs, and our replacement baseline is 121 runs produced in 650 PAs – prorate that out to 132 runs produced in 708 PAs. That gives ARod 113 VORP. ARod’s actual VORP? 96.6.
Wanna see it again? Ryan Braun had 91 runs scored, 97 RBI and 34 home runs in 492 PA. That works out to a 62 VORP. His actual VORP? 57.2.
If you care to continue this exercise, here’s the data you need:
Pos.

PA

R

HR

RBI

RP

RepRP

RepRP/PA

C

650

80

15

76

141

120

0.184

1B

650

90

22

86

154

116

0.178

2B

650

71

13

68

126

101

0.155

3B

650

87

19

83

151

121

0.186

SS

650

69

12

66

123

98

0.151

LF

650

88

20

84

152

122

0.187

CF

650

71

15

68

124

99

0.153

RF

650

88

19

85

154

123

0.190

DH

650

92

22

88

158

119

0.182

Total

650

82

17

78

143

114

0.175

So, we’ve just created a reasonable facsimile of VORP using nothing but the basic bubblegum card stats, stats that no reputable sabermetrician would get caught dead using for serious analytical work. So here’s the question – how much more accurate is VORP’s run estimator than the crudest run estimator known to man?
The key to VORP is Marginal Lineup Value, a run estimator developed by David Tate back when USENET was the hip, cool place to develop new baseball statistics. MLV was designed to counteract two flaws with Bill James? Runs Created formula, the basic version of which is OBP * SLG * AB:
1. Runs Created was designed initially to model team run scoring. When you apply it to individuals, you come up with the problems. Take Albert Pujols, who so far this year has hit .353/.461/.618 in 414 at bats. How many runs did he create? Not as many as Basic RC would lead you to believe; in reality Pujols doesn’t have the chance to bat himself in, so a model that multiplies his ridiculous OBP by his ridiculous SLG? gets to be pretty ridiculous.
2. At the same time, Pujols? high OBP means more plate appearances for his teammates – the fewer outs you make, the more times you come up to the plate in a game. There’s a real value to that which MLV attempts to capture.
3. It presents a player’s contributions above and beyond a league average player.
Before we go any further, I want to make a disclaimer. I am in no way affiliated with Baseball Prospectus and I have no insider information. I don’t own any of their annuals, either – I’m unaware of them publishing a revised formula for MLV in any of their annuals but I can’t rule it out. The only sources I have for the MLV formula are the rec.sports.baseball newsgroups (which are, to be quite frank, a pain to sift through) and Woolner’s old Stathead website. (The only formula I’ve found on BP’s website itself to address MLV is a quickanddirty version using OPS.)
So some conjecture is involved here. If anyone affiliated with BP wants to set me straight, I’d be thrilled. If you hear anything from Baseball Prospectus that contradicts what I’ve written here, believe them and not me.
The original version of MLV is, at its core, the basic version of Runs Created, which I’ll restate:
OBP * SLG * AB
Or, put another way:
OBP * Total Bases
The question is, does Baseball Prospectus still use basic Runs Created? I’m not sure. Several sources – some of them on BP themselves – allude to VORP including the value of stolen bases. Then again, Woolner’s references to this seem to say that he’s using linear weights for this purpose. * And not even very good values for it, either – he’s using the formula of:
.3 * (SB – 2*CS)
Those values are adapted from Pete Palmer’s Batting Runs, and were chosen to give stolen bases a ‘bonus’ for their added leverage. Palmer himself later recanted on that notion. A further note: Stolen Base Runs is not adjusted by position – SBR is added on after the positional adjustment.
* This would be really, really ironic. You see, Baseball Prospectus has two run estimators. One is Equivelent Runs, which is a linear equation in everything but stolen bases. The other is MLV, which is apparently a dynamic run estimator except for stolen bases. I think this is rather amusing; your mileage may vary.
So, how close is what Woolner has written on USENET and Stathead to what is actually in use today? There are some differences between Woolner’s published VORP figures on Stathead and the figures available on Baseball Prospectus. However:
1. There are more similarities than differences. (Seasons earlier than 2001 show much more differences in VORP, but that seems to be due to the data set available to Woolner prior to 2001 than anything else. He added HBP and SF to his data that season.)
2. The PA% columns are notably different from Baseball Prospectus to Stathead, which could be one source of difference. The number of stolen bases and caught stealing reported seems to differ as well.
3. Stathead used STATS, Inc. ballpark factors from their published baseball annuals; BP VORP presumably uses the Davenport Translation park factors.
Based on that, I’m comfortable with the assumption that there are not major changes in the methodology. Given that assumption, then, here is the MLV formula, as presented by Woolner:
MLV_FULL =GAMES*OUTS*(1/9 * (8*L_OBP+P_OBP) / (98*L_OBPP_OBP) *((8*L_SLG*(1L_OBP))/(1L_AVG) + P_SLG*(1P_OBP)/(1P_AVG)) – L_OBP*L_SLG/(1L_AVG))
Where L_ is for the league average and P_ is for the player in question. (The full explanation is here.) 1OBP/1AVG is a standin for atbats, in order to make it work with the Basic RC construct.
Then VORP is calculated as:
Player’s MLV – Replacement Player’s MLV + Stolen Base Runs
Where the replacement player is given the same number of PAs as the player in question.
This is all well and good, except for the fact that Basic RC isn’t the best run estimator in existence – possibly you guessed this from the fact that it’s called ‘Basic RC’ rather than ‘Only RC’ or ‘The Best RC Ever’ or something like that. A full listing of the problems with Basic RC is beyond the scope of this article (once again, Patriot is great for these sorts of things), but I’ll give you the gist:
Walks advance baserunners.
The fact that this is a patently obvious statement should give you some hint as to how fundamental a problem this is with Basic RC. A player who draws a walk only gets credit for not making an out and providing a baserunner; he gets no credit whatsoever for advancing the baserunners ahead of him. (James addressed this, and other issues, in later versions of Runs Created.) Tom Tango does a great job of breaking down what this weakness means for MLV. This isn’t precisely an issue they’re unaware of – Clay Davenport brags about how Equivalent Runs is a better run estimator than Runs Created. (And – yep, it is. Although I don’t think it’s quite as good as Davenport claims.)
And in fact, BP does calculate VORP using EqR instead of MLV, on the DT sections of the site, just called RARP instead of VORP. Compare those values to the published VORP values. Okay, sure, there’s not a lot of difference.
But then why not use Basic RC, instead of MLV? Or simply use R+RBIHR, like I started off doing? There are better run estimators in the world, and VORP isn’t reliant on any run estimator in particular. (And if you are really attached to MLV, you could adapt it to work with Tech1 RC, or BaseRuns, or even linear weights if you were so inclined.)
So, like Tango, and Rob Neyer, I’d like to see Baseball Prospectus address the issue. I don’t think it’s likely; Derek Jacques recently addressed BP’s policy on blog entries such as this one:
Sometimes, we’re criticized at Baseball Prospectus for not responding to outside critiques. It’s a conscious decision, made at the management level – we’d rather talk about baseball than about ourselves or about our colleagues in the world of baseball analysis. While we don’t respond to every broadside in every blog, we also by and large don’t spend much time critiquing others’ work.
Which is probably also why they publish both VORP and RARP.
So if nothing else, gentle reader, exercise some caution going forward. If two players are close in VORP, think twice before declaring the one with the higher value the better player.
[…] Wyers wrote an amusing and intelligent article today defining the significance of VORP (Value over Replacement Player). He takes a tongueincheek […]
Weird, your link to “average offensive production by position for 2007” at BBRef has Barry Bonds’ playerid in it. 🙂
Anyway, +1 for wanting to see VORP “fixed.” I also would like to note that I think this “fix” is only step one of many improvements VORP could use.
Good article.
Agreed, good article; I was shocked to learn of the simplicity of VORP, no wonder it makes litle sense sometimes. Just thinking about this for a second, there are so many EASY ways we could improve upon VORP. I don’t have the computing power at home but if I had time to mess around and put something together, I’m sure I can come up with something. BP’s “math” guys certainly are weak in general, if anyone wants to try and throw something together I’d love to work on a better stat I’m a pure math guy andI don’t know any SQLtype stuff, but I’ve got some ideas on the conceptual side for sure it’d would involve a MUCH more detailed analysis of “replacement player” than what they do, but seriously how fucking hard would that be…not very just look around at what the AAAA guys at a given position would be expected to give you and BOOM we have replacement level. Especially significant is the fact that they don’t even consider platoon effects….well it’s a whole lot easier to find production in a platoon (i.e. Cubs’ Edmonds/Johnson) than for a whole season. Fucking basic. God I am mad at VORP right now and how I actually thought it was useful. Other things involve taking into account the different marginal values of OBP/SLG based on the rest of your team’s lineup, and I can basically keep thinking of things as long as there is time in the day, but I need to stop raging so much and actually start doing something about it, so perhaps I will.