A quick look at baserunning
November 19, 2008 5 Comments
You’re probably looking around going, “Where’s my roundtable?” And you will have a roundtable. Probably Friday. In the meantime, I’m laying out some finger sandwiches and lemonade – a light afternoon snack, if you’d like. Partake if you wish.
So I have a baserunning evaluation metric, measured in runs above/below average. Nothing fancy or special, really. Dan Fox has covered this ground a lot better than I have. (And that’s just the tip of the iceberg.) So here’s how I dos it:
 Start with Retrosheet playbyplay data.
 Calculate run expectancy separately for each base, like this, for each season.
 Looking only at the lead baserunner, calculate the average destination run expectancy for each event. Everything was broken down by the following categories:
 Number of outs remaining,
 Event code (single, double, out, wild pitch, etc.),
 Batted ball type,
 Whether the batter was bunting,
 Whether the ball was hit to the battery (pitcher/catcher), an infielder or an outfielder,
 Whether the ball was hit to the left or right side of the field.
 Compare what a player did to the average.
Let’s say you have a runner on first, no outs. Most of the time a runner ends up on second, some of the time on third, when a groundball single is hit into left field. If a runner ends up on second, he gets a (very slight) debit. If he ends up on third, he gets a credit. All of these changes are tracked and totaled up.
Simple and easy, right? Here’s the top ten baserunning +/ seasons, 19532007:
YEAR_ID

PLAYER_ID

Name

TEAM_ID

PLUS_MINUS

1965

flooc101

Curt Flood

SLN

12

1976

patef101

Freddie Patek

KCA

12

2004

erstd001

Darin Erstad

ANA

11

1991

molip001

Paul Molitor

MIL

10

1978

puhlt001

Terry Puhl

HOU

10

2000

goodt001

Tom Goodwin

COL

10

1987

browj001

Jerry Browne

TEX

10

1974

bochb001

Bruce Bochte

CAL

10

1957

blasd101

Don Blasingame

SLN

10

1976

leflr101

Ron LeFlore

DET

10

You’ll note that the best baserunning season of the Retroera was only worth 12 runs above average. Obviously you’d prefer a good baserunner to a bad baserunner, all else being equal, but it definitely takes a backseat to hitting and defense.
Ten worst seasons?
YEAR_ID

PLAYER_ID

Name

TEAM_ID

PLUS_MINUS

2007

lodup001

Paul Lo Duca

NYN

9

1959

thomf103

Frank Thomas

CIN

9

1980

cruzj001

Jose Cruz

HOU

9

1965

johnd103

Deron Johnson

CIN

9

1962

brutb101

Bill Bruton

DET

9

1976

sizet101

Ted Sizemore

LAN

10

1974

darwb101

Bobby Darwin

MIN

10

1999

stanm002

Mike Stanley

BOS

10

1965

fairr101

Ron Fairly

LAN

10

1964

bertd101

Dick Bertell

CHN

13

UPDATE: This is too large for an EditGrid, so here’s a full spreadsheet, including career totals. Requires something that can read Excel files. Best I can do for y’all right now.
Try grouping by run differential as well
But he must be a good and valuable player… he’s fast!
I’m reticent to add any more adjustments the way I currently do it, Brian, because then you start to really shave down the sample sizes on the statetostate transitions. Especially since I’m doing it seasonbyseason, it starts to get dangerous if you drill down anymore. I’m sure there’s a way to handle it better, but a lot more work would have to go into it.
And sportwriters say stuff like that, PC, but when it comes down to brass tacks, like the MVP award, they vote a Big Damn Slugger with no other positives second. It’s all lip service.
“they vote for a Big Damn Slugger”
Like Dustin Pedroia?
I agree with you for the most part btw, I’m just giving you a hard time.
This is an idea that I’ve actually had for close to 25 years, since I kept statistics and had all the play by play for a college summer league, but it’s still on my todo list.
The way I have it conceptualized, if there’s a rare grouping of events, the expected value will not be as accurate because it’s based on a much smaller sample size – but, if the player’s samples are weighted by how often that player is in each situation, then the effect in the final rating of a larger variance in the expected value of any subgroup will be minimized by the weighting.
So, of any groupings that you have, calculate their expected rate (the league mean over x number of seasons). Find the number of times that a player was in each situation, and calculate the player’s weighted overall expected rate, then compare to the player’s observed rate, and convert to runs.