What run estimator would Batman use? (Part IV)
September 19, 2008 7 Comments
Part I, Part II and Part III are recommended reading.
I’ve spent a lot of time talking about run expectancy – our measure of run potential at any given point in time. All RE charts I’ve seen published thusfar are based upon the zero on, zero out state being the average number of runs to score in an inning. I really don’t want to reopen the holy war between RC and LWTS supporters. But I’m going to present a RE table that is different from the standard one, in a couple ways:
OUTS

RUN1_RE

RUN2_RE

RUN3_RE

0

0.3777

0.5959

0.8213

1

0.2513

0.4029

0.6438

2

0.1219

0.2233

0.2825

This is based upon the exact same dataset as the RE chart I presented last week, just broken down differently. In this case I looked at the odds of a runner scoring from first, second or third based on the number of outs in the inning.
This is an RE table that, like Runs Created, “starts from zero.” I want to emphasize that when it comes to baselines, there is no One True Answer – the correct baseline to use is determined by the question you’re trying to answer.
[As an aside – I’ve noticed a decided tendency in sabermetrics for people to divide up into opposing camps, where you’ll have the Hammer advocates on one side and Screwdriver advocates on the other. And once a sabermetrician has a hammer, everything begins to look like a nail. You can have hammers AND screwdrivers! It’s a wonderful world to be in, actually.]
The benefits of using this RE table is that we can get much more granular with the way that we use the RE table – we can look at how an event affects each player involved separately, rather than all together.
Remember back in part I, when I discussed the three aspects of run production? As a refresher, each event contributes or detracts from run production by either providing a baserunner, advancing other baserunners, or using an out. (Most events will do two or even all three of these at a time. And, while we tend to group events as either “safe plays” or “outs,” the underlying reality is a little more messy.)
So what we’re interested in is breaking down each event into its component values, and then seeing what values we come up with.
And so… here it is:
EVENT

COUNT

RUNNER

ADVANCE

OOB

OUT

LWTS

Out

3819401

0.013

0.026

0.013

0.050

0.024

Strikeout

1161343

0.001

0.002

0.000

0.055

0.053

Stolen Base

114587

0.000

0.180

0.000

0.000

0.180

Defensive Indifference

2839

0.000

0.120

0.000

0.000

0.120

Caught stealing

48906

0.000

0.010

0.263

0.015

0.268

Pickoff

24346

0.000

0.095

0.197

0.017

0.119

Wild Pitch

56520

0.000

0.265

0.001

0.000

0.263

Passed Ball

15238

0.000

0.259

0.001

0.000

0.257

Balk

9624

0.000

0.253

0.000

0.000

0.253

Other advance

2502

0.000

0.063

0.298

0.040

0.276

Foul Error

3284

0.000

0.000

0.000

0.000

0.000

Walk

607110

0.244

0.061

0.000

0.000

0.305

Intentional Walk

59403

0.185

0.004

0.000

0.000

0.189

Hit By Pitch

49877

0.251

0.078

0.000

0.000

0.329

Interference

918

0.254

0.109

0.000

0.000

0.364

Error

90717

0.288

0.205

0.002

0.001

0.490

Fielder’s choice

26606

0.304

0.181

0.371

0.152

0.037

Single

1252776

0.260

0.207

0.003

0.002

0.461

Double

314183

0.415

0.332

0.002

0.001

0.745

Triple

44499

0.590

0.430

0.000

0.000

1.020

Home Run

178776

1.000

0.404

0.000

0.000

1.404

Double play

192350

0.002

0.023

0.325

0.041

0.341

Triple play

210

0.000

0.003

1.015

0.000

1.012

Total

8076015

0.114

0.083

0.018

0.034

0.145

And… we actually have an article, folks! (Let me confess that it took the whole week to cobble this table together, working on and off, and I was afraid I would show up this morning empty handed, with a set of completely infeasible LWTS. I’m glad that’s not the case.)
A bit of explanation. “Runner” is the change in run expectancy from the batter reaching base – or the average chance of the batter eventually scoring after that event. You’ll note the slim, but still existent, chances of a batter scoring after a strikeout.
The second column is “Advance,” the positive value of the event’s interaction with the baserunners ahead of the batter. The value of the triple bothers me, I’ll be frank – it’s like as not a sampling quirk. Realistically, the advancement value of the triple and the home run should be almost identical.
The third column is “OOB,” the decrease in run expectancy due to outs on base. The fourth column is the effect of making an out on the existing baserunners. There’s a trick going on here – Runner and Advance were both computed using the number of outs prior to the event. Then we calculate the change in RE, after everything else is computed, based upon the change in outs. That lets us separate out the negative contribution of the out from the positive value of the out.
A note: the values for the double/triple play have not been corrected, as the table from last week was; these values will not reconcile properly with team/seasonal data. I still haven’t figured out how I want to do that adjustment with this particular set of linear weights.
I will note that these linear weights are much more difficult to compute than the ones in Part III, and I really don’t see the end product as being superior in any noticeable way. So why bother?
Because we now have a full, detailed set of the exact advancement value of each event. Remember our equation for BaseRuns, where B is our advancement factor? Outs are already accounted for in our C factor, and outs on base can be accounted for in the A factor. Then, to get a set of usable B coefficients that are all positive values, we have to look no further than the first two columns of that table. All we need is a tuning factor, which we can derive by calculating the necessary B value for our dataset and dividing that by the value of our proposed B coefficient.
And so, without further adieu, our No Negative B Coefficient BaseRuns:
A: (1B + E + 2B + 3B + BB +
HBP + IBB – CS – DP)B: .397 * (0.466*1B+ .493*E + .748*2B + 1.02*3B + .404*HR + .305*BB + .189*IBB + .329*HBP + .038*SB + .01*CS + .39*O + .002*K + .025*DP)
C: O + K + DP + CS
D: HR
I’m not entirely convinced this is right, and quite frankly at 2 AM I’m not convinced that I’m qualified to judge. So, we’ll test. Next week, then – same Bat time, same Bat channel!
Shouldn’t you be using the advancement value of the baserunners only to find the new B coefficients? You are including the advancement value for the batterrunner for all of the batting events except HR, at least as far as I can tell. Also, I don’t understand how the advancement value of a SB is only .038, when you give it a LW of .18. Shouldn’t all of the value of a SB be in advancement?
Your B equation at this point is basically .397*runs for all of the batting events except home runs. I don’t think there’s any way that’s going to work.
It’s entirely possible that I’m missing something really obvious here, of course.
The stolen base is simply a typo on my part. That changes the coefficient to .395.
I just finished testing it – it does work, although not particularly well. (Accuracy is pretty much a dead match for the RC version I tested in Part II.) So apparently I have a lot more work cut out for me between now and next week. (Or, if I can get things ironed out this afternoon, tomorrow.)
As for why I chose to use the advancement value of the batterrunner – there is an advancement value in that, and it’s not being captured by any other part of the BaseRuns equation. I considered doing “advancement above average” for the batterrunner but that brings us right back to the issue of negative B coefficients.
I would argue that A is where the advancement value of the batterrunner is considered, except that it assumes that all on base events result in a ~30% chance of scoring, when we know of course that this will be higher for a triple, etc. By adding it in again to B, you are going to wind up with weights that are not “steep” enough.
In a typical BsR formula the relative B values for the S/D/T/HR are usually somewhere in the neighborhood of 1/3/5/3. In this one the ratio is 1/1.6/2.2/.9. Since we know that the typical BsR formulas work very well, and the rest of your equation is pretty standard (you have A = known/final baserunners and C = all outs), the B weights are going to have to end up being close to the standard ones for it to work.
I realize that I am not offering a suggested solution, and I wish I could.
I was fiddling around with your BsR equation using Patriot’s spreadsheets. I got strange +1 LW values, and needed a “b multiplier” of 2.3. I have to agree with Patriot in that the A factor is where the advancement value of events should be considered. For instance, I considered using .55 or .60 * IBB in the A factor since the IBB only has an 17% chance of scoring.
There’s still a bit of tuning left to do, but these values work:
(1.33 * (0.251*1B+ .276*E + .532*2B + .804*3B + .404*HR + .087*BB + .112*HBP – .029 * IBB + .180*SB + .01*(CS+PO) + .027*(PA(1B+2B+3B+HR+BB+HBP+IBB+E) – K – DP) + .002*K + .024*DP)) AS B
The negative coefficent for the IBB still bugs me, but it doesn’t seem to be an issue at all. (Only three events return negative B values in our entire sample, and none of those lead to negative runs. Negative runs are still possible, though, due to negative *A* values.)
I’m getting very similar values for R, AAE and RMSE as I did with the original BsR version I tested; in fact, the original is still just a shade more accurate. I have some ideas on how to finetune the values further.
What I did was take the average chance of scoring after reaching base, and removed it from the average value of each event – using times reached base, not times event occured, as the denomentator of the average RE value of the event. Then I weighted that back out to total times the event occured. The only time the negative cropped up was in the IBB.
Colin,
How do you compute the odds of the batter scoring? Or is this irrelevant to your absolute RE table.
Colin,
What happened to your first 3 runestimator articles? I do like to reread your articles from timetotime.