What run estimator would Batman use? (Part III)
September 12, 2008 4 Comments
And we continue. Part I and Part II are recommended reading if you’re just now joining us. The past two weeks have totaled over 5,000 words; this week will be a much lighter read, because I’ve been sick most of the week and the database queries required to produce the values you’re about to see have essentially taken up most of the time and energy I’ve had to devote to this.
I presented an RE table last week, courtesy of Tom Tango. Since I’m going to be talking more about RE this week, I should present my own table, since that’s what I’m using to derive the rest of the data in this post:
0

1

2


Empty

0.486

0.259

0.098

1B only

0.859

0.512

0.220

2B only

1.106

0.674

0.327

3B only

1.334

0.936

0.374

1B & 2B

1.480

0.905

0.435

1B & 3B

1.743

1.156

0.494

2B & 3B

1.972

1.371

0.591

Loaded

2.324

1.542

0.752

It’s different, but not radically so.
Now let’s review how this works, at least as far as deriving linear weights. A single at bat’s value is determined by the change in RE from start to end; if I come up to bat with the bases empty and no outs and I hit a single, I’m responsible for a change in RE of .373.
Now, the linear weights value of a given event is the average change in RE of that event. And that information is right… here!
Name

Abbr.

LWTS

LWTS_RC

Generic Out

O

0.234

0.072

Strikeout

K

0.277

0.116

Stolen Base

SB

0.195

0.195

Defensive Indifference

DI

0.129

0.129

Caught Stealing

CS

0.525

0.365

Pickoff

PK

0.217

0.109

Wild Pitch

WP

0.276

0.276

Passed Ball

PB

0.270

0.270

Balk

BK

0.265

0.265

Other Advance

OA

0.471

0.334

Nonintentional Walk

NIBB

0.304

0.304

Intentional Walk

IBB

0.173

0.173

Hit By Pitch

HBP

0.329

0.329

Interference

XI

0.354

0.354

Error

ROE

0.495

0.497

Fielder Choice

FC

0.164

0.056

Single

1B

0.462

0.465

Double

2B

0.762

0.765

Triple

3B

1.035

1.036

Homerun

HR

1.404

1.404

Double Play

DP

0.611

0.449

A few notes. All values for except for the last row are figured only for plays where a double play did not occur. The double play value listed is the average RE shift of a double play minus the weighted average of the RE value of the underlying event.
To explain: a double play is not a distinct event. Most double plays are classified as a Generic Out, but not all. Some are strikeouts. Some are even singles and doubles. So the value of the DP is the run value above and beyond that of the event recorded.
The first column are runs above average. We can debate later whether or not this is appropriate for players. For the purposes of estimating team runs scored, we want to know absolute runs. How do we do this?
The reason we have that issue is that we start off with the assumption that .486 runs will score in an inning, and then work from there. To figure absolute runs, we need to start the inning off from 0. So we take that .486 and divide by 3 to get .162. Then we add that value to our events where an out is made.
Only here’s the trick – sometimes an out is not an out, and a safe play is. For example – on a strikeout, 99.7% of the time, the batter is out. But .3% of the time, he’s safe. (This normally happens on a dropped third strike where the runner beats the throw.) Conversely, when a runner is credited with a single, 2% of the time an out is recorded (either a runner on base ahead of him or the batter himself is thrown out trying to take an extra base on the play). So we weight our value by the likelihood that a play results in an out. This is why we see odd things like the single, double and triple changing values.
[When using these weights on individual batters or teams, there is the unaddressed issue of missing data – most of the time terms like Reach on Error, Interference or pickoffs are not available. For my purposes here that’s not an issue, but it it something I will look at addressing in a future post.]
And now I’m afraid I’m going to have to leave you with more of a cliffhanger than I would have liked. The question arose last week of how to handle negative B coefficients in BaseRuns. There’s nothing intrinsic to the BaseRuns construction that requires negative B coefficients; it’s simply that the method used for constructing B coefficients (either through regression or modeling empiric linear weights) tend to create negative B values. This is unnecessary if you include known outs on base in the A value and all known outs in the C value, but it does leave you with the problem of then figuring out the proper weight for each event. Well, here is my idea, halfformed as it is. Ladies and gentlemen, I present, an oddlooking chart:
Event

LWTS

Out

0.012

K

0.000

SB

0.195

DI

0.129

CS

0.253

PK

0.097

WP

0.276

PB

0.271

BK

0.265

OA

0.241

NIBB

0.304

IBB

0.173

HBP

0.329

XI

0.354

ROE

0.497

FC

0.097

1B

0.465

2B

0.764

3B

1.034

HR

1.404

What are you looking at here? Well, they’re linear weights values, of a sort. Specifically, they’re values based upon the assumption that the events do not change the number of outs in the inning. (Basically, I just changed my query to use the same value for outs before and after the event.)
The Caught Stealing is still a problem, one which I’m not entirely sure how to deal with just yet. And now I have to convert those weights into B coefficients.
And so… next week! Same Bat time, same Bat channel!
Colin,
You might want to double check the LW and RC values for the CS. .525 for the LW and .365 for the RC value of the CS seem too high. For the Retrosheet years, the LW value of the CS should be in the .43 to .45 range, while the RC value of the CS should be in the .27 to .29 range.
Thanks
I think that has to do with the fact that I’m counting pickoffs separately. I’ll look at it later.
The LWTS value of CS+P as a combined category is .421; the LWTS_RC is .279. So everybody’s cool, then.
Just to make sure on the last chart…. those numbers are representing the run value of an event if, say, the scoreboard keeper forgets to do the “outs” part of the scoreboard? For example… man on first, no one out. Batter hits to the second baseman and he throws to first, getting the hitter out. Your values are pretending the out was never recorded, and it’s just man on second, nobody out?
So the negative values for “out advancing” aren’t for increasing the number of outs, it’s for the lost baserunner only.
Do I have this right?