# 2007 NL Starting Pitching Analysis

When it comes to analyzing and comparing pitchers, those conducting the comparisons will often find themselves in a tricky situation.  Sure, certain pitchers are better than others, but what are they specifically better at?

How can we conduct an honest analysis when there are so many variables to consider?  And how can we truly determine which pitchers were better than others when some are on terrible teams with no run support and others are on tremendous teams with tons of run support?
The first step is to determine what we are measuring.  If we want to know who the best strikeout pitcher is, we should look at the raw total for strikeouts and also an average of K/IP, since some guys will make less starts than others.  To figure out who walks the least, we measure the number of walks each pitcher gives up and a walk-IP ratio.
These measurements are contingent on one category, though, and cannot tell us who is better or more effective than the rest.  All of the research and ideas presented in this article are designed to measure the “effectiveness” of a pitcher.
In order to determine this effectiveness, a whole heck of a lot of numbers need to be measured and properly weighted/scaled so that everybody has a fair shot – whether or not they are on a great team.
I took the 1-3 best pitchers from each National League team and entered their statistics into a database, measuring everything from their raw Innings Pitched totals to their Adjusted Quality Start % (you’ll read more on that below).  After entering all of the statistics, and crunching numbers until my brain turned to mush, I came up with my weighted points system.  I assigned the corresponding point totals and added everything up to determine what I feel is a very accurate measurement of pitching effectiveness amongst the NL’s best.
This was not applied to every single NL Pitcher in 2007 (I will do that another time) but rather amongst these 30 selected #1, #2, or #3 starters.  For instance, a guy like Jeff Suppan may have been more effective than Jason Bergmann but I wanted to have at least one person from each team.
The system is not 100% perfect and does not take into account every single statistic (do you know how many statistics there are??), but it definitely levels the playing field between those on good or bad teams, those injured/called up or just plain bad, and those who got lucky or unlucky with run support.  The points are assigned based on the areas I, as an intense student of the game, feel are most important to determine true effectiveness.
The basic idea of this system is to measure the true quality of a pitcher over his season – IE, what would happen if a pitcher was rewarded every time he pitched well and discredited every time he pitched poorly – something that happens perfectly just about 0% of the time.
We will begin by going over the statistics involved, what their points scale was, and why they are used.  The idea behind these corresponding point totals is to properly weight the areas in which most people intuitively attribute to success and quality.
The points given to each statistical subset are designed to separate the aces from the workhorses and the workhorses from the seemingly replacement level pitchers.  They may seem arbitrary and could be replaced with different numbers, or fractions/decimals, however the difference between the points in subsets was based on the amount of pitchers who fall into certain categories.
GAMES STARTED
In order to be as effective as possible, a pitcher needs to make as many starts as he can.  How can we say that a pitcher with 14 starts is more effective than one with 34-35, even if his numbers in those 14 starts are tremendous and the numbers of the one with 34-35 are a bit worse?  His numbers may be better than the pitcher with 35 starts, however the latter pitcher was involved in 21 more games and proved to be durable enough to pitch an entire season, and solid enough to maintain his SP status for 162 games.
This does not mean that a pitcher with 35 starts is necessarily “better” than one with 14-16, but rather he is more effective because he is involved in more of his team’s season.
If the pitcher with 14-16 starts posted the same numbers in 32 starts, it would not be a contest.  But, he didn’t – it was only 14-16.  You cannot have as much of an effect on your team (actual play, not motivational or anything) unless you are out there as often as possible.
***What the end result of this effectiveness points system showed is that those with average numbers, over 30+ starts, were equally as effective, or slightly better/worse, than those with good numbers over 16-20 starts.***
If somebody makes only 14 starts in a season, it could be because he was injured for half of the season or was called up from the minors during the season, so he should not be penalized with negative points for that – he just should not be rewarded as highly as someone with 30+ starts.

• if over 30 starts, +5
• if 25-29 starts, +3
• if 20-24 starts, +2
• if under 20 starts, 0

INNINGS PITCHED
Just like Games Started, IP can only get you positive numbers, because the low raw number of IP can be attributed to injury or a midseason call-up.  Those with more IP get higher point totals, though.  The reason for 0 points for under 100 innings is because you were not necessarily a bad pitcher, but the lack of innings (whether due to injury or a call-up) limits the effectiveness.

• if 230+, +8
• if 220-229, +7
• if 200-219, +5
• if 150-199, +3
• if 100-149, +2
• if under 100, +1

IP/GAME
This is where negative numbers can begin.  If you were hurt, or called up from the minors, you are not penalized with negatives for the raw number of innings pitched or games started, but if you posted a high number of starts and low number of innings, this statistic will bite you in the rear.  IP/Game separates the hurt or called up from the downright below average or bad.  It also helps reward those with a couple less starts than others but with more raw innings pitched.  These types of pitchers were in the same GS range but some went deeper into games than others.  Nobody averaged over 7 IP/gm, so we start lower.

• if 6.5-7 IP/gm, +7
• if 6.0-6.49 IP/gm, +5
• if 5.5-6 IP/gm, +3
• if 5.0-5.5 IP/gm, 0
• if below 5.0 IP/gm, -5

If you cannot average over 5 innings per game, or exactly 5 innings per game, you should not be a starting pitcher.  Even Adam Eaton averaged over 5 IP/gm in 2007.
Quality Starts can be an inaccurate statistic because it takes into account games in which a pitcher goes 6+ innings and gives up no more than 3 earned runs… and nothing else.
If a pitcher goes 8.1 innings and gives up 4 runs, it is arguably the same ratio and an equal game in terms of quality, but does not get counted as a quality start.
With that in mind, I came up with the stat of Adjusted Quality Starts, which takes into account all regular quality starts as well as games in which someone goes 7.2-9 innings and gives up no more than 4 runs.  This measures the true number of games in which a pitcher had a good-great performance.
***If you wonder why it is 7.2 IP, instead of 8, the number was derived from the amount of times a pitcher was lifted after 7.2 IP for a specialist, or other sort of reliever, and from the sheer low average of innings pitched/game by a starter this year.  Reaching the 7th inning is now a great feat, let alone coming within one out of finishing the 8th.  Though the previous ratio for a QS was 2:1, due to the data mentioned above, going an extra 1.2 IP to get to 7.2 IP merits being able to give up one more run.***
I used the percentage of AQS to the total number of Games Started to measure effectiveness in this area.  Someone over 75% almost always pitches a good-great game, whereas someone under 50% only pitches a good game less than half of the time – not very effective.

• if AQS % is above 75%, +5
• if AQS % is 67-74%, +3
• if AQS % is 50-66%, 0
• if AQS % is below 50%, -3

If you’re keeping score at home, AQS= 6+IP with ER =< 3, AND, 7.2+IP with ER =< 4, where =< is the blog version of greater than/less than or equal to.
COMPLETE GAMES & SHUTOUTS
In addition to AQS, something that needs to be taken into account is how often a pitcher went for a complete game, since they are so rare.  We also need to take into account a shutout, since they occur even less.

• For every CG, +2
• For every SHO, additional +1

***NOTE: Aaron Harang had two games in 2007, one where he went 9 IP, and one where he went 10 IP, when he did not get a decision.  Even so, I am counting these 2 as a combined 1 CG, since he went 9+ innings.***
W-L Records are the most deceiving statistics because they do not take into account the true quality of the games pitched.  Just because a pitcher goes 14-7 does not mean he was necessarily a great pitcher.  He could have pitched terribly and had great run support in 10 of 14 wins, but brilliantly with terrible run support in the 7 losses.
The whole point of the adjusted W-L records is to get an AQS, since that means you pitched well and should be rewarded, even if your team (offense or bullpen) does not help you.
After all, Ian Snell cannot control the Pirates’ offense.  It is not his fault that 4 of his 12 losses were “Tough Losses” and all 11 of his No-Decisions were games in which he pitched brilliantly and had an AQS, yet he received little to no offense to help garner him a ‘W’.
With that in mind, I changed W-L to the following 5 stats:

• Cheap Wins: wins in which one does not get an AQS (-1)
• Tough Losses: losses in which one does get an AQS (+2)
• Legit Wins: wins in which one does get an AQS (+2)
• Legit Losses: losses in which one does not get an AQS (-2)
• ND-AQS: no-decisions in which one gets an AQS (+1)

I received some questions for how these numbers came to be, and to keep it simple, the statistics that actually have an effect on the W-L record are valued higher (negatively and positively) than the statistics like ND-AQS, which prevent a pitcher from winning but do not hurt him with a loss.
ND-non AQS is not used here for the same reason that Cheap Wins is only negative one, which is that not every Cheap Win or ND-non AQS was a terrible start.  A large bulk of them were games in which a pitcher had a good outing but only went 5 or 5.1 innings.   Cheap Wins loses you a point (not two, only one) because you do not get an AQS but it does effect your win-loss record.  ND-non AQS means you do not get an AQS but it does not effect your win-loss record, which is why I decided to just leave it out.
WHIP
Though I am not too fond of this statistic and originally tinkered around with separately evaluating H/IP and BB/IP, using WHIP just seemed to make things easier.  Though it does not tell us which pitchers walk less and give up more hits, or vice versa, or tell us how many “empty innings” a pitcher had (innings where no baserunners got on), it does provide a valid average of baserunners to expect in a given game since it does not equate to a per-9 inning scale.

• if WHIP 1.00-1.15, +3
• if WHIP 1.16-1.25, +2
• if WHIP 1.26-1.30, +1
• if WHIP 1.31-1.40, 0
• if WHIP above 1.40, -2

K:BB RATIO
Instead of using K’s, I wanted to use the ratio of strikeouts to walks, since not every pitcher is a strikeout pitcher.  Even so, you do not have to be a strikeout pitcher to be an accurate one, and because of this I rewarded those with high K:BB ratios.  Greg Maddux only struck out 104 in 34 starts, but only walked 25 – a K:BB of 4.16.  This meant that Maddux kept more runners off-base by striking them out and not walking them.

• if K:BB above 4, +7
• if K:BB above 3, +5
• if K:BB above 2, +3
• if K:BB above 1, 0
• if K:BB 1 or below, -3

EXAMPLE OF USAGE
Now that we have the points, let’s test it out and put it to use.  We will use Ian Snell and Carlos Zambrano.
The table below shows Ian Snell’s 2007 numbers and points he receives for each in my points system.

 Starts 32 +5 Innings 208.0 +5 Cheap W 0 0 Tough L 4 +8 Legit W 9 +18 Legit L 8 -16 ND-AQS 11 +11 AQS % 75% +5 IP/Game 6.52 +7 WHIP 1.33 0 K:BB 2.60 +3 CG 1 +2 SHO 0 0

When we add up all eleven of these numbers, we get Snell’s Effectiveness #, which comes to: +48.
Now, let’s look at Carlos Zambrano’s season numbers in the table below and add his point totals up.

 Starts 34 +5 Innings 216.1 +5 Cheap W 0 0 Tough L 2 +4 Legit W 18 +36 Legit L 11 -22 ND-AQS 0 0 AQS % 53% 0 IP/Game 6.36 +5 WHIP 1.34 0 K:BB 1.75 0 CG 1 +2 SHO 0 0

We look at his numbers and add up the totals to get his Effectiveness #: +35.
Zambrano had more legit wins but also more legit losses, and of Zambrano’s 3 no-decisions, none were ND-AQS, whereas of Snell’s 11 no-decisions, all were ND-AQS.
That tells us that if each player got a win for every game he pitched well, and a loss for every game he did not pitch well (did not get an AQS), and the only no-decisions they received came from no-decisions that they pitched poorly in or did not go a full 6 IP, their records would look like this –

• Carlos Zambrano (18-13) would actually be 20-11
• Ian Snell (9-12) would actually be 24-8

Snell went further into his games, had a better K:BB ratio, and had that higher AQS %.  It also tells us that of Snell’s 32 starts, 24 of them were of great quality, whereas Zambrano had 18 good-great starts and 16 average-bad starts.
This essentially tells us that while Zambrano’s good-great starts may have been better than Snell’s good-great starts, when Zambrano had his bad starts, Snell was still having good-great ones.
RESULTS
As mentioned before, I used this points system to evaluate 30 National League pitchers.  I compiled a group of spreadsheets, ranking the pitchers in order in different categories to show that certain stats we rely on do a bad job of proving effectiveness.
To view all of my results, click on the links below.  You can use this data in other areas, but please credit my work.

• To see the list of pitchers and their statistics used to assign points, click here.
• To see the list of pitchers in order of effectiveness points, click here.

I do not want to post a ridiculously long table on this article, so you will need to look at the linked files to see the results, but I will list the top 15 pitchers and their effectiveness points.

1. Jake Peavy, +74
2. Aaron Harang, +69
3. John Smoltz, +69
4. Brandon Webb, +67
5. Cole Hamels, +65
7. Tim Hudson, +63
8. Ted Lilly, +60
9. Matt Cain, +52
10. Roy Oswalt, +50
11. Ian Snell, +48
12. Bronson Arroyo, +47
13. Derek Lowe, +47
16. Jeff Francis, +45

And, again, these points were assigned to statistics based on how important they corrolate to effectiveness.  The points system essentially covers the statistics and averages from all angles.
CHRIS YOUNG
The most shocking part of this was how low Chris Young of the Padres came out.  Young went 9-8, with a 3.12 ERA, in 30 starts.  He should have been more effective, I thought, based on those numbers.  After looking at his game logs, though, I changed my mind and realized it made sense.
Of his 30 starts, he was essentially two different people.  In the 19 starts in which he went for 6+ innings, he was 9-1 with a 1.64 ERA, averaging 6.6 IP/gm, with a 0.85 WHIP and 129 K’s in 126.1 innings.
In the other 11 starts, he was 0-7, with a 7.14 ERA, only going 4.2 IP/gm, with a 1.76 WHIP, and 38 K to his 36 BB, in 46.2 innings.
After analyzing his situation and the points system I realized that my effectiveness model favors consistency and lower standard deviations (the average of how far someone strays from his average).  To me, that truly defines effectiveness.
I would much rather have a guy who I knew would amass an AQS 67% or more of the time than a guy who might strikeout 20 batters and pitch a two-hitter in one game, but give up 5 runs in 6 innings for the next three, before again pitching a brilliant game.
As long as the consistency is of a good nature, consistency in this model proves effectiveness.
CONCLUSION
I know, we’re finally at the end of the article, right?  I apologize for the length but it took this long to get everything across.
Looking at Jake Peavy, the most effective NL pitcher at +74, we see that the only counted statistic in which he led was AQS.  Peavy had the most good-great starts of any NL pitcher.  While he may not have led in IP, IP/gm, K:BB ratio, or least losses (Brad Penny only had 1 legit loss), he led in consistency and being consistently good-great.
These results also show that Cole Hamels, with 6 more starts that he missed due to injury, would likely challenge Peavy for #1 in effectiveness – however, as my model dictates, the fact that he missed those 6 starts and Peavy did not shows that Peavy was more effective.
Yes, there were more stats we could add to this, and more variables to account for, but I feel this accurately levels the field of play between pitchers in distinctly different playing situations, and levels the difference between 2007 reputation and 2007 actual performance.
I must remind you before I come to a close, though, that this is only a measure of effectiveness, not the end-all solution to determining who the “best” pitchers are.
However, for this Sabermetrician, effectiveness directly corrolates with quality and value.

### 8 Responses to 2007 NL Starting Pitching Analysis

1. Corey Seidman says:

Very good stuff. Took me a while to read it all but worth it! I have always wondered about guys like Zambrano who seem to be so up and down. And always good to see my homeboy Adam Wainwright on the list! Surprised to see Arroyo so high on the list though ahead of guys like Maddux, Francis, and Lowe.

2. Dan Foster says:

Nice work. I mean, the scale is “arbitrary”, strictly speaking, but for the most part I think it reflects our ordinary intuitions about what makes a quality pitcher.
Incidentally, I did a lot of work on the starters in 07 myself, and Chris Young faired very poorly. Instead of using a +/- system, I used a couple of weighted Run Average derivatives and AVGIP/GS, set a couple of different thresholds (“Ace”, “Workhorse”, “Serviceable” etc), and ran all the pitchers through logical tests to see if they satisfied them. I think the absolute key to valuating starting pitching (if you’re a GM or a dork like us) is to correctly weight length versus Run Average.
I love Adjusted Quality Starts by the way. I’ll be stealing that. Though, I think its more effective as a one-off judgment. That is, take his average outing in terms of IP and RA, and if IP>6 AND IP/RA>2 (as in six innings, three runs or its equivalent), you’ve got a “quality” starter. Otherwise, not.

3. BJ says:

nice article. AQS is cool, but i don’t get why it is 7.2 innings and 4 runs for quality. Is getting a little over an inning further really worth those extra runs? dan’s method makes more sense to me, and if you wanted to apply it to yours it would be the same as saying 8+ innings of 4 runs. The thing is if you apply the same standard (.5 run per inning) the AQS probably looks a lot like normal QS…
Also, what was the rationale for making a tough loss +2 and a ND-AQS +1? The only difference I see is that in one you don’t get run support and in the other you don’t get bulpen or run support. The only way I could see this rationalized is if the ND is b/c the pitcher left in a tie, but not all ND are these, and still, if there were run support earlier or bulpen support after these should be the same. why not just leave it at 2+ and give a negative for ND that were not AQS. Of course, this might give a greater weight to AQS which might have been what you were trying to avoid, but this could be fixed by inflating other categories.

4. Dan,
Glad you enjoyed it. The reason I did the 7.2 is because of how rarely pitchers went into the 8th inning – and yes that should mean that those who did should get rewarded and that is exactly how this works since the wide majority of QS (before the AQS) involved pitchers going 6.0-7.0 innings.
I came up with the number after looking at everyone’s game logs (yes I looked at EVERYONES’S GAME LOGS) and determining how many of these AQS’s a pitcher would compile. I wrote an article last week about the decline in starting pitchers and their innings pitched and so while I originally started at 8 innings and over, and 4 runs or less, I dropped it off by 0.1 to level the playing field since managers are so reliever-happy these days and love taking a guy out after 7.2 IP for a lefty specialist or something along those lines.
And the only reason I would not be in favor of using the formula you wrote is that I do not feel giving up over 4 runs is a quality outing no matter if you go 10-12 innings while doing it. I know that is rare, but you never know.
So, what I’m looking for in an AQS is anything credited as a regular QS, and then any games in which a pitcher goes 7.2+ and gives up no more than 4.

5. BJ,
The only reason I would not use the system Dan suggested is because, like you said, it turns everything to Quality Starts, just adjusts on the same ratio – as I mentioned in the last comment, going 10 and giving up 5 is not a quality outing in my eyes, so I cannot just leave it as a 2:1 ratio in terms of IP:RA. It would have to be 6+ and RA equal or less than 3, and then 7.2+ and RA equal or less than 4.
In terms of the W-L stats I kept were given certain points values, I went back and added it into the article. The gist is that the statistics that can effect your actual W-L record (a tough loss counts as a loss in your W-L record) get higher values, positive or negative, because of this effect on the actual W-L. Something like a no-decision prevents you from winning but does not give you a loss.
Cheap Wins and ND-non AQS, as I just added to the article are tricky because it does not mean that a start was bad, but just does not qualify… IE – 5 IP, 3 H, 2 ER and a WIN, is a cheap win in my eyes because you do not go a high enough amount of innings to really be effective for your team, but you’re still pitching well – so I did not want to penalize people too much for that.
When it came to separating the Cheap Wins like that and the Cheap Wins that were more like 7 IP, 9 H, 6 ER and a Win, it was easier and made more sense to just give a -1. ND-non AQS is not counted at all because it does not effect W-L record, whereas Cheap Wins does, so it should lose you a bit of quality on your effectiveness number.

6. Dan Foster says:

Good discussion here. Though I definitely see where you’re coming from with your reluctance to call 10IP/5RA type of performances “Quality Starts” we have to think a little about the rationale behind the concept. I always figured (and I have no proof of this) that Quality Starts were something like starts in which give (even) an average team a better than average chance of winning.
So, in my mind, as long as a performance results in fewer RA than a team’s AVG-RS, I would call it “quality”–in a very broad sense. In that you’ve given your team better than even odds at winning the game.
Lastly, just to clarify. The reason I like the 2/1 ratio is more about the fact that I am interested in judging whole season averages, not counting individual game logs. So, you want Quality Starts to be projectible to guys who average 6.4 innings a start and 3.1RA.

7. Dan, you just made my point and perfectly explained the differential in our systems, much better than I was trying to do, haha – in that yours is looking at the season totals whereas mine is more of a game-game basis.
Most of my research deals with trying to figure out what to expect in a given game – it’s why I wrote the E.R.A Reevaluation article and why I like to look at game logs as opposed to seasonal projections or averages.
I like to know what I am going to get in Jamie Moyer’s 9th start based on his IP/gm and Standard Deviation for that point… if we just see that he went for 5.2 IP/gm in the previous 8 starts, it tells us one thing… if we see that he went for an average of 5.2 IP/gm in those 8 starts, it could be less and could be more EVERY time out, but with a St. Dev. of 0 to accompany the 5.2 IP/gm, it means that every time he pitched he went for 5.2 IP so we know exactly what to expect.
I like your system, and feel that if you add Standard Deviation to it, it will be very effective.
Neither yours or mine are wrong, and like BA, OBP, and SLG, each tells us something else we should know.
For a season, a player who averaged 6.4 IP and 3.1 RA is definitely a quality starter. And, it perfectly works in your system, because of the thresholds you described – Ace, Workhorse, Serviceable. In your system, you could say that anyone with the Quality Start or AQS Ratio over a season is a “Quality Pitcher” and that only “Quality Pitchers” can be considered for Ace or Workhorse or something along those lines.
What I am trying to determine in my research in this area is how to properly differentiate between all of the pitchers that find themselves in your thresholds.
I feel that with my effectiveness model we can properly differentiate between guys like Peavy, Webb, Harang, Smoltz, Hamels – guys who would all likely be aces in your system – but it is tricky to rank them on their seasons since they were all in different situations and had different pitching circumstances, because of how many different variables it takes into account.
I honestly feel a combination of our efforts would be extremely effective. For instance, your system tells us which pitchers are in what thresholds and categories, which helps monetarily-wise, and then we plug them into my effectiveness model to rank them in order of “Ace-ness” or “Workhorse-manship” or whatever the suffixes are.
Drop me a line. Seidburns850@aol.com. I’d love to test my theory out and combine our work to see if we can come up with a more in-depth analysis.

8. […] only glanced at and thought, gee, I really should read this more carefully when I have time: 2007 NL Starting Pitching Analysis (Statistically Speaking) | tRA and ROA, a new pitching metric (Lookout […]