What run estimator would Batman use? (Part IV)

Part I, Part II and Part III are recommended reading.

I’ve spent a lot of time talking about run expectancy – our measure of run potential at any given point in time. All RE charts I’ve seen published thusfar are based upon the zero on, zero out state being the average number of runs to score in an inning. I really don’t want to reopen the holy war between RC and LWTS supporters. But I’m going to present a RE table that is different from the standard one, in a couple ways:

OUTS
RUN1_RE
RUN2_RE
RUN3_RE
0
0.3777
0.5959
0.8213
1
0.2513
0.4029
0.6438
2
0.1219
0.2233
0.2825

This is based upon the exact same dataset as the RE chart I presented last week, just broken down differently. In this case I looked at the odds of a runner scoring from first, second or third based on the number of outs in the inning.

This is an RE table that, like Runs Created, “starts from zero.” I want to emphasize that when it comes to baselines, there is no One True Answer – the correct baseline to use is determined by the question you’re trying to answer.

[As an aside – I’ve noticed a decided tendency in sabermetrics for people to divide up into opposing camps, where you’ll have the Hammer advocates on one side and Screwdriver advocates on the other. And once a sabermetrician has a hammer, everything begins to look like a nail. You can have hammers AND screwdrivers! It’s a wonderful world to be in, actually.]

The benefits of using this RE table is that we can get much more granular with the way that we use the RE table – we can look at how an event affects each player involved separately, rather than all together.

Remember back in part I, when I discussed the three aspects of run production? As a refresher, each event contributes or detracts from run production by either providing a baserunner, advancing other baserunners, or using an out. (Most events will do two or even all three of these at a time. And, while we tend to group events as either “safe plays” or “outs,” the underlying reality is a little more messy.)

So what we’re interested in is breaking down each event into its component values, and then seeing what values we come up with.

And so… here it is:

EVENT
COUNT
RUNNER
ADVANCE
OOB
OUT
LWTS
Out
3819401
0.013
0.026
-0.013
-0.050
-0.024
Strikeout
1161343
0.001
0.002
0.000
-0.055
-0.053
Stolen Base
114587
0.000
0.180
0.000
0.000
0.180
Defensive Indifference
2839
0.000
0.120
0.000
0.000
0.120
Caught stealing
48906
0.000
0.010
-0.263
-0.015
-0.268
Pickoff
24346
0.000
0.095
-0.197
-0.017
-0.119
Wild Pitch
56520
0.000
0.265
-0.001
0.000
0.263
Passed Ball
15238
0.000
0.259
-0.001
0.000
0.257
Balk
9624
0.000
0.253
0.000
0.000
0.253
Other advance
2502
0.000
0.063
-0.298
-0.040
-0.276
Foul Error
3284
0.000
0.000
0.000
0.000
0.000
Walk
607110
0.244
0.061
0.000
0.000
0.305
Intentional Walk
59403
0.185
0.004
0.000
0.000
0.189
Hit By Pitch
49877
0.251
0.078
0.000
0.000
0.329
Interference
918
0.254
0.109
0.000
0.000
0.364
Error
90717
0.288
0.205
-0.002
-0.001
0.490
Fielder’s choice
26606
0.304
0.181
-0.371
-0.152
-0.037
Single
1252776
0.260
0.207
-0.003
-0.002
0.461
Double
314183
0.415
0.332
-0.002
-0.001
0.745
Triple
44499
0.590
0.430
0.000
0.000
1.020
Home Run
178776
1.000
0.404
0.000
0.000
1.404
Double play
192350
0.002
0.023
-0.325
-0.041
-0.341
Triple play
210
0.000
0.003
-1.015
0.000
-1.012
Total
8076015
0.114
0.083
-0.018
-0.034
0.145

And… we actually have an article, folks! (Let me confess that it took the whole week to cobble this table together, working on and off, and I was afraid I would show up this morning empty handed, with a set of completely infeasible LWTS. I’m glad that’s not the case.)

A bit of explanation. “Runner” is the change in run expectancy from the batter reaching base – or the average chance of the batter eventually scoring after that event. You’ll note the slim, but still existent, chances of a batter scoring after a strikeout.

The second column is “Advance,” the positive value of the event’s interaction with the baserunners ahead of the batter. The value of the triple bothers me, I’ll be frank – it’s like as not a sampling quirk. Realistically, the advancement value of the triple and the home run should be almost identical.

The third column is “OOB,” the decrease in run expectancy due to outs on base. The fourth column is the effect of making an out on the existing baserunners. There’s a trick going on here – Runner and Advance were both computed using the number of outs prior to the event. Then we calculate the change in RE, after everything else is computed, based upon the change in outs. That lets us separate out the negative contribution of the out from the positive value of the out.

A note: the values for the double/triple play have not been corrected, as the table from last week was; these values will not reconcile properly with team/seasonal data. I still haven’t figured out how I want to do that adjustment with this particular set of linear weights.

I will note that these linear weights are much more difficult to compute than the ones in Part III, and I really don’t see the end product as being superior in any noticeable way. So why bother?

Because we now have a full, detailed set of the exact advancement value of each event. Remember our equation for BaseRuns, where B is our advancement factor? Outs are already accounted for in our C factor, and outs on base can be accounted for in the A factor. Then, to get a set of usable B coefficients that are all positive values, we have to look no further than the first two columns of that table. All we need is a tuning factor, which we can derive by calculating the necessary B value for our dataset and dividing that by the value of our proposed B coefficient.

And so, without further adieu, our No Negative B Coefficient BaseRuns:

A: (1B + E + 2B + 3B + BB +
HBP + IBB – CS – DP)

B: .397 * (0.466*1B+ .493*E + .748*2B + 1.02*3B + .404*HR + .305*BB + .189*IBB + .329*HBP + .038*SB + .01*CS + .39*O + .002*K + .025*DP)

C: O + K + DP + CS

D: HR

I’m not entirely convinced this is right, and quite frankly at 2 AM I’m not convinced that I’m qualified to judge. So, we’ll test. Next week, then – same Bat time, same Bat channel!

Advertisements

7 Responses to What run estimator would Batman use? (Part IV)

  1. Patriot says:

    Shouldn’t you be using the advancement value of the baserunners only to find the new B coefficients? You are including the advancement value for the batter-runner for all of the batting events except HR, at least as far as I can tell. Also, I don’t understand how the advancement value of a SB is only .038, when you give it a LW of .18. Shouldn’t all of the value of a SB be in advancement?
    Your B equation at this point is basically .397*runs for all of the batting events except home runs. I don’t think there’s any way that’s going to work.
    It’s entirely possible that I’m missing something really obvious here, of course.

  2. Colin Wyers says:

    The stolen base is simply a typo on my part. That changes the coefficient to .395.
    I just finished testing it – it does work, although not particularly well. (Accuracy is pretty much a dead match for the RC version I tested in Part II.) So apparently I have a lot more work cut out for me between now and next week. (Or, if I can get things ironed out this afternoon, tomorrow.)
    As for why I chose to use the advancement value of the batter-runner – there is an advancement value in that, and it’s not being captured by any other part of the BaseRuns equation. I considered doing “advancement above average” for the batter-runner but that brings us right back to the issue of negative B coefficients.

  3. Patriot says:

    I would argue that A is where the advancement value of the batter-runner is considered, except that it assumes that all on base events result in a ~30% chance of scoring, when we know of course that this will be higher for a triple, etc. By adding it in again to B, you are going to wind up with weights that are not “steep” enough.
    In a typical BsR formula the relative B values for the S/D/T/HR are usually somewhere in the neighborhood of 1/3/5/3. In this one the ratio is 1/1.6/2.2/.9. Since we know that the typical BsR formulas work very well, and the rest of your equation is pretty standard (you have A = known/final baserunners and C = all outs), the B weights are going to have to end up being close to the standard ones for it to work.
    I realize that I am not offering a suggested solution, and I wish I could.

  4. terpsfan101 says:

    I was fiddling around with your BsR equation using Patriot’s spreadsheets. I got strange +1 LW values, and needed a “b multiplier” of 2.3. I have to agree with Patriot in that the A factor is where the advancement value of events should be considered. For instance, I considered using .55 or .60 * IBB in the A factor since the IBB only has an 17% chance of scoring.

  5. Colin Wyers says:

    There’s still a bit of tuning left to do, but these values work:
    (1.33 * (0.251*1B+ .276*E + .532*2B + .804*3B + .404*HR + .087*BB + .112*HBP – .029 * IBB + .180*SB + .01*(CS+PO) + .027*(PA-(1B+2B+3B+HR+BB+HBP+IBB+E) – K – DP) + .002*K + .024*DP)) AS B
    The negative coefficent for the IBB still bugs me, but it doesn’t seem to be an issue at all. (Only three events return negative B values in our entire sample, and none of those lead to negative runs. Negative runs are still possible, though, due to negative *A* values.)
    I’m getting very similar values for R, AAE and RMSE as I did with the original BsR version I tested; in fact, the original is still just a shade more accurate. I have some ideas on how to fine-tune the values further.
    What I did was take the average chance of scoring after reaching base, and removed it from the average value of each event – using times reached base, not times event occured, as the denomentator of the average RE value of the event. Then I weighted that back out to total times the event occured. The only time the negative cropped up was in the IBB.

  6. terpsfan101 says:

    Colin,
    How do you compute the odds of the batter scoring? Or is this irrelevant to your absolute RE table.

  7. terpsfan101 says:

    Colin,
    What happened to your first 3 run-estimator articles? I do like to re-read your articles from time-to-time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: