## Pitch Counts and Normalized Innings

More on pitching. A couple years ago, Joe Sheehan wrote an article about decreasing workloads for pitchers. He concluded that
“Whereas the task of pitching the entire game may have been a reasonable expectation for the first 30, 40, maybe 80 years of organized baseball, now it requires too many pitches thrown with too much effort.”
A problem that analysts have faced in comparing pitchers of yesteryear to today’s pitchers has been the problem of decreasing workloads. Cy Young pitched well over 7,000 innings; even today’s greatest horse, Roger Clemens, has thrown 3,000 innings less. But if we were to look at their pitch counts, we would probably see that the gap was much smaller than their innings pitched would suggest. The problem is that pitch count data is not available for the time we need it most: 100 years ago, when it took many fewer pitches to finish a game than it does now.
Thankfully, Tangotiger, one of the leaders in sabermetric analysis, has proposed a simple formula for estimating pitch counts:
3.3*PA + 1.5*SO + 2.2*BB
I’ve found that it correlates almost perfectly with actual pitch counts (r = .97), so this tool is very useful when trying to figure innings pitched based on pitch counts. What I did was, using the Lahman Database, estimate the number of pitches thrown by every pitcher in baseball history. Then, to translate that into modern innings, I divided by 170 and multiplied by 9 as I’ve found that the average game takes about 170 pitches nowadays.
Here is the modified career leader board:

 nameLast nameFirst IP Pitches Tr.Inn Young Cy 7354.7 106073 5616 Ryan Nolan 5386 89217.5 4723 Galvin Pud 6003.3 87620.2 4639 Johnson Walter 5914.7 86633.8 4586 Niekro Phil 5404.3 83826.9 4438 Carlton Steve 5217.3 81790.5 4330 Perry Gaylord 5350.3 80779.7 4277 Sutton Don 5282.3 79697.9 4219 Spahn Warren 5243.7 78134.4 4137 Blyleven Bert 4970 76080.2 4028 Keefe Tim 5047.7 75744.5 4010 Nichols Kid 5056.3 75693.5 4007 Alexander Pete 5190 74451.6 3942 Seaver Tom 4782.7 72435.7 3835 Wynn Early 4564 71452.4 3783 John Tommy 4710.3 71120.9 3765 Clemens Roger 4493 70835.4 3750 Kaat Jim 4530.3 68843.4 3645 Roberts Robin 4688.7 68794.1 3642 Mathewson Christy 4780.7 68758.6 3640 Jenkins Fergie 4500.7 67701.4 3584 Ruffing Red 4344 67681.4 3583 Radbourn Charley 4535.3 67073 3551 Rixey Eppa 4494.7 66293.6 3510 Plank Eddie 4495.7 65869.9 3487 Welch Mickey 4802 65642.2 3475 Tanana Frank 4188.3 65135.8 3448 Grimes Burleigh 4179.7 64381.7 3408 Lyons Ted 4161 62967.5 3334 Maddux Greg 4181.3 62353.9 3301 Faber Red 4086.7 62163.1 3291 Newsom Bobo 3759.3 61274.5 3244 Feller Bob 3827 61146.3 3237 Martinez Dennis 3999.7 61074.7 3233 Grove Lefty 3940.7 60899.3 3224 Gibson Bob 3884.3 60639.1 3210 Hough Charlie 3801.3 60567 3206 Jones Sam 3883 60012.4 3177 Morris Jack 3824 59971 3175 McCormick Jim 4275.7 59921 3172 Palmer Jim 3948 59371.8 3143 Koosman Jerry 3839.3 59256.4 3137 Quinn Jack 3920.3 58312.4 3087 Glavine Tom 3740.3 58067.2 3074 Bunning Jim 3760.3 58021.9 3072 Whitehill Earl 3564.7 57296.7 3033 Hoyt Waite 3762.3 57221.5 3029 Reuss Jerry 3669.7 56760.5 3005 Lolich Mickey 3638.3 56627.8 2998 Niekro Joe 3584 55444.7 2935

While the overall order remains fairly similar, the innings become much more compressed (the standard deviation among the top-50 drops from 754 to 559). The difference between Cy Young and Roger Clemens becomes 1,000 innings smaller. It’s still huge, however, and that’s because while I have adjusted for pitch counts, I have not adjusted for the second part of the equation: that Young was able to throw pitches with lesser effort due to the substandard batters he generally faced. I have an idea of how to adjust for this, and when I look into it, I’ll present my findings, but for now, I just want to show just how pitch counts can impact innings pitched.

## Clarifying some things

I got a comment yesterday that I felt needed some extra clarification:
“Great idea. Let me just get one thing straight – the number next to Santana’s means what, exactly? How many runs below average he has saved?”
First of all, those are dashes, not negative signs. Sorry if there was any confusion over that. Secondly, what I literally did was converted Santana’s (and every other pitcher’s) numbers into runs created, like you would have for hitters. So if Santana has 54 runs created, his performance has been equivalent to that of Luis Gonzales. There is one more step, converting the numbers into Win Shares, that will balance out some things between hitters and pitchers, but when you’re comparing pitchers to pitchers, all you really need are their runs created.
Just to clarify a little more, what I’m doing here is putting pitching numbers on an absolute scale. This is not above/below average, replacement, or any kind of baseline, it is performance above zero (which is why RC should always be positive, though there are some cases in which they might be slightly negative due to a small sample size, which I will later discuss). There is no baseline, and that’s the beauty of it.

## Pitching Runs Created – Corrected

There were some mistakes in my previous post, so let me re-post the leaders:
AL
1. Johan Santana – 53.91177409
3. Mark Buehrle – 49.72016507
4. Matt Clement – 43.00086884
5. Randy Johnson – 41.47244024
6. Dan Haren – 40.97976546
7. Bartolo Colon – 40.15336766
8. Jeremy Bonderman – 38.85439381
9. Jake Westbrook – 38.79135507
10. Paul Byrd – 38.41606312
11. Chris Young – 38.30119765
12. Sidney Ponson – 38.03283086
13. Kevin Brown – 37.41166294
14. John Lackey – 36.73535892
15. Freddy Garcia – 36.43384283
16. Jon Garland – 35.17740346
17. Barry Zito – 35.02640459
18. Zack Greinke – 34.59156142
19. Bronson Arroyo – 34.02380078
20. Chan Ho Park – 33.96994604
21. Mike Mussina – 33.85067345
22. C.C. Sabathia – 33.50750678
23. Carl Pavano – 33.47051084
24. Daniel Cabrera – 33.24156436
25. Mike Maroth – 33.12293081
NL
1. Pedro Martinez – 49.82314631
2. Chris Carpenter – 48.33463065
3. Dontrelle Willis – 47.97496051
4. Javier Vazquez – 45.3098585
5. John Smoltz – 45.09792362
6. Derek Lowe – 44.04807772
7. Jake Peavy – 43.13596892
8. A.J. Burnett – 42.90334427
9. Roger Clemens – 42.77053382
10. Livan Hernandez – 41.43306967
11. Andy Pettitte – 40.08012874
12. Roy Oswalt – 39.76317389
13. Aaron Harang – 39.59858114
14. Cory Lidle – 38.57858255
15. Brandon Webb – 37.54680616
16. Josh Beckett – 37.34887474
17. Brett Myers – 36.15946462
18. Matt Morris – 35.46958166
19. Esteban Loaiza – 34.92092456
20. Brett Tomko – 34.54112118
21. Mark Redman – 33.61971324
22. Joe Kennedy – 33.33657994
24. Carlos Zambrano – 31.15155188
I’ll be updating these weekly throughout the season.

## Pitchers’ Runs Created

Later, I will introduce a new Win Shares type system that solves many of the problems found in Bill James’ first attempt at producing one number, measured in wins, to characterize player value. Right now, however, let me introduce you to an important part of the system I will unveil, pitchers’ runs created.
My biggest problem with WS is that they do not actually measure absolute value, which is what James claims they do. Rather, they measure a player’s value over roughly a .170 W% level. The problem that James ran into is that while measuring absolute value for hitters is easy–just use runs created–there is no such number for pitchers. Thus, to get around this problem, James devised marginal runs, which look something like this:
RS – LG/2 = marginal offensive runs (where Lg is the league average)
1.5*LG – RA = marginal defensive runs
But what he’s doing here (and this, in my mind, is his first and most pivotal mistake) is comparing players not to a zero baseline, as he claims to do, but rather to a baseline of 1.5*Lg or Lg/2, which is bad, but not zero.
Again, the simple problem is that there is no number like RC for pitchers, a number where everything is equal to zero or more and where a higher number is better. But how do we come up with such a number? After some brainstorming, it hit me, like a pile of bricks or a once-in-a-lifetime idea: Why not convert runs allowed into runs scored?
Think about it: we know how runs scored and runs allowed interact with W%, so why not convert RA into RS by using W%? Using an average baseline for runs scored, we can predict a W% for any team or player on its/his runs allowed. More so, we can convert that W% into runs scored by converting it back using an average baseline for runs allowed. So if a player allows 3 R/G when the average team scores 4.5 R/G has a .692 W%, he can be said to be scoring 6.75 R/G, as a team that scores that much and allows 4.5 R/G will also have a .692 W%.
Using this, we can come up with defensive runs created (DRC), which will be on the same baseline as runs created for hitters. Of course, they must still be split between pitchers and fielders after. I will provide the formulas that I use for this adjustment a little later, but I would like to keep this article as devoid of technicalities as possible.
Pitching Runs Created can be used to judge Cy Young races, Greatest of All-Time debates, and trades. They’re easy to use and understand, and the great thing is, since they’re put on a zero baseline, they will measure absolute value.
Now, without further delay, let me present the top-10 for each league this year:
AL
1. Johan Santana – 58.88
3. Mark Buehrle – 52.09
4. Bartolo Colon – 48.61
5. Matt Clement – 47.70
6. Dan Haren – 46.13
7. John Lackey – 44.60
8. Chris Young – 42.66
9. Jeremy Bonderman – 42.18
10. Randy Johnson – 41.25
NL
1. Pedro Martinez – 58.54
2. Chris Carpenter – 58.48
3. Dontrelle Willis – 53.95
4. Jake Peavy – 51.10
5. Roger Clemens – 50.95
6. John Smoltz – 50.25
7. Roy Oswalt – 50.12
8. Livan Hernandez – 49.69
9. A.J. Burnett – 46.81
10. Andy Pettitte – 43.22
The top-two in each league are very close, so it should be a good race through. Santana’s great season has been marred by a strangely horrible defense; while his BABIP is low (.276), he’s still allowing about .6 more runs than his peripherals (HR, BB, K) would indicate. Also, because of all the batters he strikes out, Santana gets more credit for his stellar pitching. In the NL, Pedro Martinez is similarly higher up than he is on other lists because of his high strikeout rate. His fielders have actually been above average. NL wins leader Dontrelle Willis is close behind in third, and could still finish first before all is said and done. On the other hand, AL wins leader John Garland is nowhere to be found on this list, with 33.46 PRC. The White Sox’s incredible defense has contributed much to his record. Roger Clemens, who most statheads would have likely picked as the best pitcher in the NL, is in fifth place, as his fielders make him look almost a full run better than he actually is.