# The Visual WPA Project

Across the statistical spectrum a major debate has raged for quite some time: the statistical analysts vs. the scouts. Both thinks one another is wrong and bases decisions off of faulty methods. Though small portions of each side embraces the other the large majority does not. For this very reason WPA—Win Probability Added—has gotten some heat from those against baseball statistical analysis.
Essentially, WPA tracks the contributions of an individual to the win or loss of his team. It adds the accumulative differences in Win Expectancy percentages to determine who helped or hindered what specific percent of their team’s efforts.
To find the Win Expectancy of any game state, look in the Toolshed section of The Book or visit Christopher Shea’s Win Expectancy Finder online. For a great article on WPA, read Studes’ “The One About Win Probability.”
When I described WPA to a friend of mine—one on neither side of the analysis war—he responded with: “Yeah, but they’re human. Certain numbers like that cannot track true effort.”  While I disagree that a number cannot accurately track effort I do feel the current WPA could potentially improve to track even more effort; or properly divvy up the effort to take into account these more human qualities. His comment made me wonder what would happen were we to combine our intuitive scouting as fans with a statistic like WPA; as in, would the results be so different than what we currently have?
If those in opposition to numbers really feel that human aspects of the game make such a drastic difference that an anarchic overthrow of WPA would be necessary then it seems to be a good idea to test that theory out. In conducting a study like this we would basically be measuring certain game aspects previously determined to be immeasurable with a stringent set of criterion.
Logistics
TangoTiger helped me harness this idea when it was discovered I was preparing to write an article he had previously written.  He informed me that his thoughts echoed those of my friend—the numbers might be improved upon combining percentages with intuitive scouting. The visual WPA would work much like the current statistic only there would be certain plays or situations with which we could apply our opinion of effort or contribution.
In May 2007, the Phillies were playing the Marlins and Rod Barajas made an absolutely boneheaded play. Hanley Ramirez was rounding third base and Pat Burrell’s throw creamed Ramirez in a race; by the time Barajas had the ball Ramirez was still one-fourth of the way from home plate. Despite this, Barajas, for whatever reason, did not block the plate or attempt a tag until Ramirez slid. Ramirez ended up being safe. In terms of WPA pitcher Brett Myers was debited the full amount but Barajas clearly deserves some of the blame. This is an example of a situation that only watching the game would be able to determine those deserving of credit or debit.
Though the Barajas situation falls into the category of separating fielders from pitchers in the contribution department, there are also the ever so frequent non-error errors. We’ve all seen these plays wherein a fielder should be able to get to a ball but it gets through infield or drops in the outfield. Errors are not charged in these specific plays however we know they should have been made. Why should the pitcher be fully debited for allowing a single when we intuitively understand that the play should have been made?
Other examples where a Visual WPA would benefit us are:

1. Runner on first legging it out to third on a single, or scoring on a double when we intuitively feel he has no shot.
2. Pitcher with a slow wind-up should be debited on an SB, not a catcher, whereas someone like Roy Oswalt (fast windup) would have more of an effect on a runner stealing.
3. Separating a bad judgment or decision by a 3B Coach from the runner thrown out at a base he perhaps had no legit shot at reaching.
4. We’ve all seen examples of Harry Kalas’s famous line: “Right down the middle for a ball.” If a pitcher strikes a better out but the ump fails to call the strike we should debit the umpire a bit because he incorrectly lengthened the inning.

And these are just a few of the examples of situations that would benefit from intuitive scouting.
Separating the contributions between batter/runner and fielder/pitcher has been studied before but I am proposing a use of our own intuition as fans in order to make these separations instead of a concrete set of numbers or measurable criteria. For instance, a runner legging it out to third on a single when we normally think he would have to stop at second base would be left to our intuition. We would be using our knowledge of the runner, the position of the outfielder, the throwing arm of the outfielder, and the importance of the situation in order to make our judgment.
If David Ortiz legs it out to third base in the first inning of a 0-0 game we may be inclined to split the WPA between he and the hitter by giving Ortiz 1/3 of the increase and the hitter 2/3. If the same situation occurs in the bottom of the 9th in a game in which the Red Sox trail 2-1 we may be much more inclined to give Ortiz 2/3 and the hitter 1/3. This allows us to separate contributions based on how we feel and our own scouting abilities. Essentially it lets us determine if scouting and the more human/gutsy/Eckstein-esque plays really effect or make much of a difference on what the statistics tell us.
Opinions vs. Measures
One of the big gripes here is that an opinion of mine with regards to a runner legging out an extra base may be completely different than someone else. This, however, is the beauty of baseball and how scouting works; scouts will differ in opinions on the same player. Though statistics are generally immobile scouting and intuition can shift. It is scary when presenting a potential concept like this—one that includes a combination of fact and opinion—but the idea is to see if we will truly produce different results; or at least results different enough to show that the immeasurable human aspects of the game really do make certain numbers less useful.
Conclusion: A Call to Arms
I am going to test this out with a three-game series in order to compare the differences and if anybody would like to help, by conducting their own three-game test, please e-mail me. If we can get a bunch of series logged, and if we see there are potentially big differences, there may be good reason to try this out over an extended period of time.  Ultimately, if the results are deemed significantly different than we can say that these game aspects only evident by watching a game truly make a difference.  At the very least we will be garnering a more accurate version of an already accurate statistic.  I will post my initial results of the three-game series next week.

### 19 Responses to The Visual WPA Project

1. studes says:

As in, at the end, will someone with a WPA of, say, 0.98 through five games of just play by play actually be at 1.52 due to aspects we can only get by watching.
Yes, of course.

2. studes says:

This is exactly what I used to do in my old Game of the Week series at the Hardball Times. Here’s one example:
http://www.hardballtimes.com/main/article/game-in-review-indians-vs-diamondbacks/

3. Pizza Cutter says:

Eric, the solution to the subjectivity problem is inter-rater reliability. Basically, 2 or 3 sets of eyes is better than one, so long as you can get some agreement between them. Eventually, you build it into a system of rules that actually chops up WPA into more bite size chunks and does so more properly. This type of thing is the next frontier in Sabermetrics: mixed methods.

4. This is a fascinating idea, and in some ways it reminds me of the game charting project over on Football Outsiders.
WPA is one of my favorite stats, but it’s one that would benefit greatly from human oversight, especially on unusually good or bad defensive plays. Giving the credit to the pitcher when an outfielder makes a spectacular catch to rob a near-certain XBH is just wrong, but it’s what happens when the data comes straight from the box score.

5. Studes, I had searched for similar examples but couldn’t find any. Thanks for linking to it. What I’m curious of is not necessarily one game but rather a series or a month. As in, at the end, will someone with a WPA of, say, 0.98 through five games of just play by play actually be at 1.52 due to aspects we can only get by watching.
Pizza, I definitely agree which is why I asked for some help in the end. It might be good to test it out on our own as well as a group in order to see if there is even a difference in our intuitions.
Jessica, what might be fun is if you and I do the same series being that I’m a Phillies fan and you’re a Mets fan.

6. studes says:

Eric, three years ago a number of sites blogged and followed specific games using the WPA spreadsheet, before Fangraphs started doing it. I think Jeff Sullivan still does at Lookout Landing, and at least one other site does, too.
I took note of these sites as they were happening. Several are listed here…
http://www.baseballgraphs.com/main/index.php/site/months/2005/05/
..and if you look in the month before and after this page you might find some others.

7. studes says:

One other thought, I started logging Reds’ games as a favor to someone in 2005. I think I got about a months’ worth done. I can look for those somewhere.
Bottom line, a lot of people have put effort into tracking games themselves with WPA. That’s what the spreadsheet was built for. But the practice sort of died with Fangraphs coming up to speed.

8. Studes, if you have those Reds games I would definitely love to look at them. Since I do not believe the current WPA divvies up credit/debit between pitcher and fielder what I would do is twofold: First, just use the intuition as well as scouting and then second, divvy up as well as combine intuition.
Ultimately, if it is determined that significant differences exist it would be a fun experiment to try it out over the course of a season – get a few people for each team or so and track it.

9. Studes, also, if I’m reading you correctly (between the lines), let me just clarify that I’m in no way suggesting I created this idea or anything along those lines but rather think it would be a fun and interesting experiment to try over an extended period of time to really put the statistic to use. It’s great as is but if we did those for months or a season–and we followed Russell’s inter-rater reliability suggestion–we are going to end up with more accurate results that combine both sides of this analysis war.

10. studes says:

Studes, also, if I’m reading you correctly (between the lines), let me just clarify that I’m in no way suggesting I created this idea…
No, no I don’t mean to imply that. I just want you to be aware of the work that’s already been done. You said you had looked on the Internet and not found anything, but there’s actually a lot out there.
We were on our way to organizing something on a mass scale like what you’re suggesting before David started creating his graphs on Fangraphs. When someone is creating a version that’s 90% good enough for free, why put all the extra effort into getting the other 10%? Sort of stopped the movement cold.
Significant differences do exist, if for no other reason than the pitching/fielding split.
The key to doing it right, as Tango probably told you, is the “double entry” approach — maybe even “triple entry.” Those sorts of entries will always require guesswork and ultimately leave some people unsatisfied.

11. I hear you loud and clear. If you do find those Reds games, though, I would definitely like to take a look just to show the differences between what we get from scouting and numbers and what we get from just numbers.
I know what you mean with stopping the movement cold but I am just really fascinated with what we might see were we to do this for a month or even a season – just to know. The 10% extra is a lot of effort, like you said, but I don’t know, part of me wants to know what we could find.

12. Sky says:

Maybe I’m misinterpreting the point here, but couldn’t the same objective be met by using advanced play-by-play data? Or is the issue not having that available? Does David Appelman have the stats available to run baserunning and fielding (something like UZR and PMR) metrics?

13. Sky, the point is to combine the WPA probabilities found on either Shea’s WE Finder, or the Toolshed of “The Book” or Studes’ site with information that is most easily available by watching a game. The issue is that those opposed to statistical analysis tend to feel that there is so much more to gain from watching a game so the objective is to watch a game and combine both our intuitive scouting as fans as well as the probability figures from WPA to properly divvy the WPA up. This would provide a more accurate WPA. I’m not looking for baserunning or fielding metrics but rather an experiment involving WPA. Sorry if there was confusion.

14. studes says:

I wouldn’t use the Win Expectancy Finder. That is based on limited empirical data. Eric, have you seen the WPA spreadsheet on Baseball Graphs? If you haven’t used it to follow a game, you’re missing a fun experience. Seriously! Changed the way I watch baseball.
Sky, you could achieve most of what Eric wants with PBP data that includes zone data — but you’d be missing some of the key plays Eric is also talking about, such as a saved throw by the first baseman, baserunning gaffes (could have gone to third but didn’t), etc.

15. Studes, just saw it. Amazing! I’m definitely using it for my series or multiple game evaluations. Thanks!
Yeah the PBP data would have to be insanely detailed with certain things you can usually only get from watching a game. There are also issues like missed calls by umpires, third base coaches making errors in judgments.
It would be close but if I’m going to do something like this to test differences in accumulative WPA over an extended period of time I would like to be as detailed as possible. The Baseball Graphs spreadsheet is fantastic.

16. donb says:

Wow, nice article. I am doing something similar only on a very small scale. I am tracking Adam Dunn’s Home Run contribution to the bottom line for each Cincinnati win to see if his 13 million a year salary is worth it. Fans argue that his 40 homers/100 RBIs are worth it, despite his high strikeout rate, high walk rate, and bad defense. I am NOT tracking contributions made in any other way because these contributions could be made by any other average player. Dunn lives by the Homer, so my quest is to see just how many games is won by the homer.
So far, my findings can be found on the Yahoo MLB site for Cincinnati’s Message Board under Dunn-Watch. So far, the Reds are 6-4 and my Watch-Board indicates the Reds would still be 6-4 if Dunn were erased from existence. Again, I have limited the variables just to Homer-induced wins for the Reds by Dunn.
Best regards,
Don Begley

17. Don, sounds great! Definitely keep my updated on what you find. My e-mail is on the site or just comment.

18. John Beamer says:

Studes — when all the blogs popped up a couple of year’s ago with WPA, did they really divvy up credit between pitcher and fielder?
Eric — question for you. For one of your “non-error” errors you could argue that you shouldn’t even give the hitter credit for the play as he should have been out. What would be your intention with that?
The issue is that could have a knock-on effect for all other plays because you aren’t actually scoring the real game at that point. I guess you would keep the hitter wpa as is, but it doesn’t seem entirely fair.

19. John, what I would do, and I’m just waking up so this answer may change, but what I would do is split the credit between pitcher and batter and fully debit the fielder. The pitcher did his job in putting the on the ground or so and the play should have been made but the fielder did not come through. If the total increase is 12% I would debit the fielder -12% and give the pitcher and batter each +6%. Or it could be just credit batter and debit fielder fully.
I’ll post results Thursday so I have some time to decide between now and then but my first inclination is to go with that first one.