What run estimator would Batman use? (Part III)

And we continue. Part I and Part II are recommended reading if youíre just now joining us. The past two weeks have totaled over 5,000 words; this week will be a much lighter read, because Iíve been sick most of the week and the database queries required to produce the values youíre about to see have essentially taken up most of the time and energy Iíve had to devote to this.

I presented an RE table last week, courtesy of Tom Tango. Since Iím going to be talking more about RE this week, I should present my own table, since thatís what Iím using to derive the rest of the data in this post:

0
1
2
Empty
0.486
0.259
0.098
1B only
0.859
0.512
0.220
2B only
1.106
0.674
0.327
3B only
1.334
0.936
0.374
1B & 2B
1.480
0.905
0.435
1B & 3B
1.743
1.156
0.494
2B & 3B
1.972
1.371
0.591
Loaded
2.324
1.542
0.752

It’s different, but not radically so.

Now letís review how this works, at least as far as deriving linear weights. A single at batís value is determined by the change in RE from start to end; if I come up to bat with the bases empty and no outs and I hit a single, Iím responsible for a change in RE of .373.

Now, the linear weights value of a given event is the average change in RE of that event. And that information is right… here!

Name
Abbr.
LWTS
LWTS_RC
Generic Out
O
-0.234
-0.072
Strikeout
K
-0.277
-0.116
Stolen Base
SB
0.195
0.195
Defensive Indifference
DI
0.129
0.129
Caught Stealing
CS
-0.525
-0.365
Pickoff
PK
-0.217
-0.109
Wild Pitch
WP
0.276
0.276
Passed Ball
PB
0.270
0.270
Balk
BK
0.265
0.265
Other Advance
OA
-0.471
-0.334
Nonintentional Walk
NIBB
0.304
0.304
Intentional Walk
IBB
0.173
0.173
Hit By Pitch
HBP
0.329
0.329
Interference
XI
0.354
0.354
Error
ROE
0.495
0.497
Fielder Choice
FC
-0.164
-0.056
Single
1B
0.462
0.465
Double
2B
0.762
0.765
Triple
3B
1.035
1.036
Homerun
HR
1.404
1.404
Double Play
DP
-0.611
-0.449

A few notes. All values for except for the last row are figured only for plays where a double play did not occur. The double play value listed is the average RE shift of a double play minus the weighted average of the RE value of the underlying event.

To explain: a double play is not a distinct event. Most double plays are classified as a Generic Out, but not all. Some are strikeouts. Some are even singles and doubles. So the value of the DP is the run value above and beyond that of the event recorded.

The first column are runs above average. We can debate later whether or not this is appropriate for players. For the purposes of estimating team runs scored, we want to know absolute runs. How do we do this?

The reason we have that issue is that we start off with the assumption that .486 runs will score in an inning, and then work from there. To figure absolute runs, we need to start the inning off from 0. So we take that .486 and divide by 3 to get .162. Then we add that value to our events where an out is made.

Only hereís the trick Ė sometimes an out is not an out, and a safe play is. For example Ė on a strikeout, 99.7% of the time, the batter is out. But .3% of the time, heís safe. (This normally happens on a dropped third strike where the runner beats the throw.) Conversely, when a runner is credited with a single, 2% of the time an out is recorded (either a runner on base ahead of him or the batter himself is thrown out trying to take an extra base on the play). So we weight our value by the likelihood that a play results in an out. This is why we see odd things like the single, double and triple changing values.

[When using these weights on individual batters or teams, there is the unaddressed issue of missing data Ė most of the time terms like Reach on Error, Interference or pickoffs are not available. For my purposes here thatís not an issue, but it it something I will look at addressing in a future post.]

And now Iím afraid Iím going to have to leave you with more of a cliffhanger than I would have liked. The question arose last week of how to handle negative B coefficients in BaseRuns. Thereís nothing intrinsic to the BaseRuns construction that requires negative B coefficients; itís simply that the method used for constructing B coefficients (either through regression or modeling empiric linear weights) tend to create negative B values. This is unnecessary if you include known outs on base in the A value and all known outs in the C value, but it does leave you with the problem of then figuring out the proper weight for each event. Well, here is my idea, half-formed as it is. Ladies and gentlemen, I present, an odd-looking chart:

Event
LWTS
Out
0.012
K
0.000
SB
0.195
DI
0.129
CS
-0.253
PK
-0.097
WP
0.276
PB
0.271
BK
0.265
OA
-0.241
NIBB
0.304
IBB
0.173
HBP
0.329
XI
0.354
ROE
0.497
FC
0.097
1B
0.465
2B
0.764
3B
1.034
HR
1.404

What are you looking at here? Well, they’re linear weights values, of a sort. Specifically, theyíre values based upon the assumption that the events do not change the number of outs in the inning. (Basically, I just changed my query to use the same value for outs before and after the event.)

The Caught Stealing is still a problem, one which Iím not entirely sure how to deal with just yet. And now I have to convert those weights into B coefficients.

And soÖ next week! Same Bat time, same Bat channel!

Advertisements

4 Responses to What run estimator would Batman use? (Part III)

  1. terpsfan101 says:

    Colin,
    You might want to double check the LW and RC values for the CS. -.525 for the LW and -.365 for the RC value of the CS seem too high. For the Retrosheet years, the LW value of the CS should be in the -.43 to -.45 range, while the RC value of the CS should be in the -.27 to -.29 range.
    Thanks

  2. Colin Wyers says:

    I think that has to do with the fact that I’m counting pickoffs separately. I’ll look at it later.

  3. Colin Wyers says:

    The LWTS value of CS+P as a combined category is -.421; the LWTS_RC is -.279. So everybody’s cool, then.

  4. dan says:

    Just to make sure on the last chart…. those numbers are representing the run value of an event if, say, the scoreboard keeper forgets to do the “outs” part of the scoreboard? For example… man on first, no one out. Batter hits to the second baseman and he throws to first, getting the hitter out. Your values are pretending the out was never recorded, and it’s just man on second, nobody out?
    So the negative values for “out advancing” aren’t for increasing the number of outs, it’s for the lost baserunner only.
    Do I have this right?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: