Pitch Counts and Normalized Innings

More on pitching. A couple years ago, Joe Sheehan wrote an article about decreasing workloads for pitchers. He concluded that
“Whereas the task of pitching the entire game may have been a reasonable expectation for the first 30, 40, maybe 80 years of organized baseball, now it requires too many pitches thrown with too much effort.”
A problem that analysts have faced in comparing pitchers of yesteryear to today’s pitchers has been the problem of decreasing workloads. Cy Young pitched well over 7,000 innings; even today’s greatest horse, Roger Clemens, has thrown 3,000 innings less. But if we were to look at their pitch counts, we would probably see that the gap was much smaller than their innings pitched would suggest. The problem is that pitch count data is not available for the time we need it most: 100 years ago, when it took many fewer pitches to finish a game than it does now.
Thankfully, Tangotiger, one of the leaders in sabermetric analysis, has proposed a simple formula for estimating pitch counts:
3.3*PA + 1.5*SO + 2.2*BB
I’ve found that it correlates almost perfectly with actual pitch counts (r = .97), so this tool is very useful when trying to figure innings pitched based on pitch counts. What I did was, using the Lahman Database, estimate the number of pitches thrown by every pitcher in baseball history. Then, to translate that into modern innings, I divided by 170 and multiplied by 9 as I’ve found that the average game takes about 170 pitches nowadays.
Here is the modified career leader board:

nameLast

nameFirst

IP

Pitches

Tr.Inn

Young

Cy

7354.7

106073

5616

Ryan

Nolan

5386

89217.5

4723

Galvin

Pud

6003.3

87620.2

4639

Johnson

Walter

5914.7

86633.8

4586

Niekro

Phil

5404.3

83826.9

4438

Carlton

Steve

5217.3

81790.5

4330

Perry

Gaylord

5350.3

80779.7

4277

Sutton

Don

5282.3

79697.9

4219

Spahn

Warren

5243.7

78134.4

4137

Blyleven

Bert

4970

76080.2

4028

Keefe

Tim

5047.7

75744.5

4010

Nichols

Kid

5056.3

75693.5

4007

Alexander

Pete

5190

74451.6

3942

Seaver

Tom

4782.7

72435.7

3835

Wynn

Early

4564

71452.4

3783

John

Tommy

4710.3

71120.9

3765

Clemens

Roger

4493

70835.4

3750

Kaat

Jim

4530.3

68843.4

3645

Roberts

Robin

4688.7

68794.1

3642

Mathewson

Christy

4780.7

68758.6

3640

Jenkins

Fergie

4500.7

67701.4

3584

Ruffing

Red

4344

67681.4

3583

Radbourn

Charley

4535.3

67073

3551

Rixey

Eppa

4494.7

66293.6

3510

Plank

Eddie

4495.7

65869.9

3487

Welch

Mickey

4802

65642.2

3475

Tanana

Frank

4188.3

65135.8

3448

Grimes

Burleigh

4179.7

64381.7

3408

Lyons

Ted

4161

62967.5

3334

Maddux

Greg

4181.3

62353.9

3301

Faber

Red

4086.7

62163.1

3291

Newsom

Bobo

3759.3

61274.5

3244

Feller

Bob

3827

61146.3

3237

Martinez

Dennis

3999.7

61074.7

3233

Grove

Lefty

3940.7

60899.3

3224

Gibson

Bob

3884.3

60639.1

3210

Hough

Charlie

3801.3

60567

3206

Jones

Sam

3883

60012.4

3177

Morris

Jack

3824

59971

3175

McCormick

Jim

4275.7

59921

3172

Palmer

Jim

3948

59371.8

3143

Koosman

Jerry

3839.3

59256.4

3137

Quinn

Jack

3920.3

58312.4

3087

Glavine

Tom

3740.3

58067.2

3074

Bunning

Jim

3760.3

58021.9

3072

Whitehill

Earl

3564.7

57296.7

3033

Hoyt

Waite

3762.3

57221.5

3029

Reuss

Jerry

3669.7

56760.5

3005

Lolich

Mickey

3638.3

56627.8

2998

Niekro

Joe

3584

55444.7

2935

While the overall order remains fairly similar, the innings become much more compressed (the standard deviation among the top-50 drops from 754 to 559). The difference between Cy Young and Roger Clemens becomes 1,000 innings smaller. It’s still huge, however, and that’s because while I have adjusted for pitch counts, I have not adjusted for the second part of the equation: that Young was able to throw pitches with lesser effort due to the substandard batters he generally faced. I have an idea of how to adjust for this, and when I look into it, I’ll present my findings, but for now, I just want to show just how pitch counts can impact innings pitched.

Clarifying some things

I got a comment yesterday that I felt needed some extra clarification:
“Great idea. Let me just get one thing straight – the number next to Santana’s means what, exactly? How many runs below average he has saved?”
First of all, those are dashes, not negative signs. Sorry if there was any confusion over that. Secondly, what I literally did was converted Santana’s (and every other pitcher’s) numbers into runs created, like you would have for hitters. So if Santana has 54 runs created, his performance has been equivalent to that of Luis Gonzales. There is one more step, converting the numbers into Win Shares, that will balance out some things between hitters and pitchers, but when you’re comparing pitchers to pitchers, all you really need are their runs created.
Just to clarify a little more, what I’m doing here is putting pitching numbers on an absolute scale. This is not above/below average, replacement, or any kind of baseline, it is performance above zero (which is why RC should always be positive, though there are some cases in which they might be slightly negative due to a small sample size, which I will later discuss). There is no baseline, and that’s the beauty of it.

Pitching Runs Created – Corrected

There were some mistakes in my previous post, so let me re-post the leaders:
AL
1. Johan Santana – 53.91177409
2. Roy Halladay – 51.99620392
3. Mark Buehrle – 49.72016507
4. Matt Clement – 43.00086884
5. Randy Johnson – 41.47244024
6. Dan Haren – 40.97976546
7. Bartolo Colon – 40.15336766
8. Jeremy Bonderman – 38.85439381
9. Jake Westbrook – 38.79135507
10. Paul Byrd – 38.41606312
11. Chris Young – 38.30119765
12. Sidney Ponson – 38.03283086
13. Kevin Brown – 37.41166294
14. John Lackey – 36.73535892
15. Freddy Garcia – 36.43384283
16. Jon Garland – 35.17740346
17. Barry Zito – 35.02640459
18. Zack Greinke – 34.59156142
19. Bronson Arroyo – 34.02380078
20. Chan Ho Park – 33.96994604
21. Mike Mussina – 33.85067345
22. C.C. Sabathia – 33.50750678
23. Carl Pavano – 33.47051084
24. Daniel Cabrera – 33.24156436
25. Mike Maroth – 33.12293081
NL
1. Pedro Martinez – 49.82314631
2. Chris Carpenter – 48.33463065
3. Dontrelle Willis – 47.97496051
4. Javier Vazquez – 45.3098585
5. John Smoltz – 45.09792362
6. Derek Lowe – 44.04807772
7. Jake Peavy – 43.13596892
8. A.J. Burnett – 42.90334427
9. Roger Clemens – 42.77053382
10. Livan Hernandez – 41.43306967
11. Andy Pettitte – 40.08012874
12. Roy Oswalt – 39.76317389
13. Aaron Harang – 39.59858114
14. Cory Lidle – 38.57858255
15. Brandon Webb – 37.54680616
16. Josh Beckett – 37.34887474
17. Brett Myers – 36.15946462
18. Matt Morris – 35.46958166
19. Esteban Loaiza – 34.92092456
20. Brett Tomko – 34.54112118
21. Mark Redman – 33.61971324
22. Joe Kennedy – 33.33657994
23. Brad Halsey – 32.39268393
24. Carlos Zambrano – 31.15155188
25. Brad Penny – 30.71854767
I’ll be updating these weekly throughout the season.

Pitchers’ Runs Created

Later, I will introduce a new Win Shares type system that solves many of the problems found in Bill James’ first attempt at producing one number, measured in wins, to characterize player value. Right now, however, let me introduce you to an important part of the system I will unveil, pitchers’ runs created.
My biggest problem with WS is that they do not actually measure absolute value, which is what James claims they do. Rather, they measure a player’s value over roughly a .170 W% level. The problem that James ran into is that while measuring absolute value for hitters is easy–just use runs created–there is no such number for pitchers. Thus, to get around this problem, James devised marginal runs, which look something like this:
RS – LG/2 = marginal offensive runs (where Lg is the league average)
1.5*LG – RA = marginal defensive runs
But what he’s doing here (and this, in my mind, is his first and most pivotal mistake) is comparing players not to a zero baseline, as he claims to do, but rather to a baseline of 1.5*Lg or Lg/2, which is bad, but not zero.
Again, the simple problem is that there is no number like RC for pitchers, a number where everything is equal to zero or more and where a higher number is better. But how do we come up with such a number? After some brainstorming, it hit me, like a pile of bricks or a once-in-a-lifetime idea: Why not convert runs allowed into runs scored?
Think about it: we know how runs scored and runs allowed interact with W%, so why not convert RA into RS by using W%? Using an average baseline for runs scored, we can predict a W% for any team or player on its/his runs allowed. More so, we can convert that W% into runs scored by converting it back using an average baseline for runs allowed. So if a player allows 3 R/G when the average team scores 4.5 R/G has a .692 W%, he can be said to be scoring 6.75 R/G, as a team that scores that much and allows 4.5 R/G will also have a .692 W%.
Using this, we can come up with defensive runs created (DRC), which will be on the same baseline as runs created for hitters. Of course, they must still be split between pitchers and fielders after. I will provide the formulas that I use for this adjustment a little later, but I would like to keep this article as devoid of technicalities as possible.
Pitching Runs Created can be used to judge Cy Young races, Greatest of All-Time debates, and trades. They’re easy to use and understand, and the great thing is, since they’re put on a zero baseline, they will measure absolute value.
Now, without further delay, let me present the top-10 for each league this year:
AL
1. Johan Santana – 58.88
2. Roy Halladay – 56.44
3. Mark Buehrle – 52.09
4. Bartolo Colon – 48.61
5. Matt Clement – 47.70
6. Dan Haren – 46.13
7. John Lackey – 44.60
8. Chris Young – 42.66
9. Jeremy Bonderman – 42.18
10. Randy Johnson – 41.25
NL
1. Pedro Martinez – 58.54
2. Chris Carpenter – 58.48
3. Dontrelle Willis – 53.95
4. Jake Peavy – 51.10
5. Roger Clemens – 50.95
6. John Smoltz – 50.25
7. Roy Oswalt – 50.12
8. Livan Hernandez – 49.69
9. A.J. Burnett – 46.81
10. Andy Pettitte – 43.22
The top-two in each league are very close, so it should be a good race through. Santana’s great season has been marred by a strangely horrible defense; while his BABIP is low (.276), he’s still allowing about .6 more runs than his peripherals (HR, BB, K) would indicate. Also, because of all the batters he strikes out, Santana gets more credit for his stellar pitching. In the NL, Pedro Martinez is similarly higher up than he is on other lists because of his high strikeout rate. His fielders have actually been above average. NL wins leader Dontrelle Willis is close behind in third, and could still finish first before all is said and done. On the other hand, AL wins leader John Garland is nowhere to be found on this list, with 33.46 PRC. The White Sox’s incredible defense has contributed much to his record. Roger Clemens, who most statheads would have likely picked as the best pitcher in the NL, is in fifth place, as his fielders make him look almost a full run better than he actually is.

Follow

Get every new post delivered to your Inbox.