December 30, 2007 8 Comments
When it comes to analyzing and comparing pitchers, those conducting the comparisons will often find themselves in a tricky situation. Sure, certain pitchers are better than others, but what are they specifically better at?
How can we conduct an honest analysis when there are so many variables to consider? And how can we truly determine which pitchers were better than others when some are on terrible teams with no run support and others are on tremendous teams with tons of run support?
The first step is to determine what we are measuring. If we want to know who the best strikeout pitcher is, we should look at the raw total for strikeouts and also an average of K/IP, since some guys will make less starts than others. To figure out who walks the least, we measure the number of walks each pitcher gives up and a walk-IP ratio.
These measurements are contingent on one category, though, and cannot tell us who is better or more effective than the rest. All of the research and ideas presented in this article are designed to measure the “effectiveness” of a pitcher.
In order to determine this effectiveness, a whole heck of a lot of numbers need to be measured and properly weighted/scaled so that everybody has a fair shot – whether or not they are on a great team.
I took the 1-3 best pitchers from each National League team and entered their statistics into a database, measuring everything from their raw Innings Pitched totals to their Adjusted Quality Start % (you’ll read more on that below). After entering all of the statistics, and crunching numbers until my brain turned to mush, I came up with my weighted points system. I assigned the corresponding point totals and added everything up to determine what I feel is a very accurate measurement of pitching effectiveness amongst the NL’s best.
This was not applied to every single NL Pitcher in 2007 (I will do that another time) but rather amongst these 30 selected #1, #2, or #3 starters. For instance, a guy like Jeff Suppan may have been more effective than Jason Bergmann but I wanted to have at least one person from each team.
The system is not 100% perfect and does not take into account every single statistic (do you know how many statistics there are??), but it definitely levels the playing field between those on good or bad teams, those injured/called up or just plain bad, and those who got lucky or unlucky with run support. The points are assigned based on the areas I, as an intense student of the game, feel are most important to determine true effectiveness.
The basic idea of this system is to measure the true quality of a pitcher over his season – IE, what would happen if a pitcher was rewarded every time he pitched well and discredited every time he pitched poorly – something that happens perfectly just about 0% of the time.
We will begin by going over the statistics involved, what their points scale was, and why they are used. The idea behind these corresponding point totals is to properly weight the areas in which most people intuitively attribute to success and quality.
The points given to each statistical subset are designed to separate the aces from the workhorses and the workhorses from the seemingly replacement level pitchers. They may seem arbitrary and could be replaced with different numbers, or fractions/decimals, however the difference between the points in subsets was based on the amount of pitchers who fall into certain categories.
In order to be as effective as possible, a pitcher needs to make as many starts as he can. How can we say that a pitcher with 14 starts is more effective than one with 34-35, even if his numbers in those 14 starts are tremendous and the numbers of the one with 34-35 are a bit worse? His numbers may be better than the pitcher with 35 starts, however the latter pitcher was involved in 21 more games and proved to be durable enough to pitch an entire season, and solid enough to maintain his SP status for 162 games.
This does not mean that a pitcher with 35 starts is necessarily “better” than one with 14-16, but rather he is more effective because he is involved in more of his team’s season.
If the pitcher with 14-16 starts posted the same numbers in 32 starts, it would not be a contest. But, he didn’t – it was only 14-16. You cannot have as much of an effect on your team (actual play, not motivational or anything) unless you are out there as often as possible.
***What the end result of this effectiveness points system showed is that those with average numbers, over 30+ starts, were equally as effective, or slightly better/worse, than those with good numbers over 16-20 starts.***
If somebody makes only 14 starts in a season, it could be because he was injured for half of the season or was called up from the minors during the season, so he should not be penalized with negative points for that – he just should not be rewarded as highly as someone with 30+ starts.
- if over 30 starts, +5
- if 25-29 starts, +3
- if 20-24 starts, +2
- if under 20 starts, 0
Just like Games Started, IP can only get you positive numbers, because the low raw number of IP can be attributed to injury or a midseason call-up. Those with more IP get higher point totals, though. The reason for 0 points for under 100 innings is because you were not necessarily a bad pitcher, but the lack of innings (whether due to injury or a call-up) limits the effectiveness.
- if 230+, +8
- if 220-229, +7
- if 200-219, +5
- if 150-199, +3
- if 100-149, +2
- if under 100, +1
This is where negative numbers can begin. If you were hurt, or called up from the minors, you are not penalized with negatives for the raw number of innings pitched or games started, but if you posted a high number of starts and low number of innings, this statistic will bite you in the rear. IP/Game separates the hurt or called up from the downright below average or bad. It also helps reward those with a couple less starts than others but with more raw innings pitched. These types of pitchers were in the same GS range but some went deeper into games than others. Nobody averaged over 7 IP/gm, so we start lower.
- if 6.5-7 IP/gm, +7
- if 6.0-6.49 IP/gm, +5
- if 5.5-6 IP/gm, +3
- if 5.0-5.5 IP/gm, 0
- if below 5.0 IP/gm, -5
If you cannot average over 5 innings per game, or exactly 5 innings per game, you should not be a starting pitcher. Even Adam Eaton averaged over 5 IP/gm in 2007.
ADJUSTED QUALITY STARTS
Quality Starts can be an inaccurate statistic because it takes into account games in which a pitcher goes 6+ innings and gives up no more than 3 earned runs… and nothing else.
If a pitcher goes 8.1 innings and gives up 4 runs, it is arguably the same ratio and an equal game in terms of quality, but does not get counted as a quality start.
With that in mind, I came up with the stat of Adjusted Quality Starts, which takes into account all regular quality starts as well as games in which someone goes 7.2-9 innings and gives up no more than 4 runs. This measures the true number of games in which a pitcher had a good-great performance.
***If you wonder why it is 7.2 IP, instead of 8, the number was derived from the amount of times a pitcher was lifted after 7.2 IP for a specialist, or other sort of reliever, and from the sheer low average of innings pitched/game by a starter this year. Reaching the 7th inning is now a great feat, let alone coming within one out of finishing the 8th. Though the previous ratio for a QS was 2:1, due to the data mentioned above, going an extra 1.2 IP to get to 7.2 IP merits being able to give up one more run.***
I used the percentage of AQS to the total number of Games Started to measure effectiveness in this area. Someone over 75% almost always pitches a good-great game, whereas someone under 50% only pitches a good game less than half of the time – not very effective.
- if AQS % is above 75%, +5
- if AQS % is 67-74%, +3
- if AQS % is 50-66%, 0
- if AQS % is below 50%, -3
If you’re keeping score at home, AQS= 6+IP with ER =< 3, AND, 7.2+IP with ER =< 4, where =< is the blog version of greater than/less than or equal to.
COMPLETE GAMES & SHUTOUTS
In addition to AQS, something that needs to be taken into account is how often a pitcher went for a complete game, since they are so rare. We also need to take into account a shutout, since they occur even less.
- For every CG, +2
- For every SHO, additional +1
***NOTE: Aaron Harang had two games in 2007, one where he went 9 IP, and one where he went 10 IP, when he did not get a decision. Even so, I am counting these 2 as a combined 1 CG, since he went 9+ innings.***
WINS AND LOSSES (ADJUSTED)
W-L Records are the most deceiving statistics because they do not take into account the true quality of the games pitched. Just because a pitcher goes 14-7 does not mean he was necessarily a great pitcher. He could have pitched terribly and had great run support in 10 of 14 wins, but brilliantly with terrible run support in the 7 losses.
The whole point of the adjusted W-L records is to get an AQS, since that means you pitched well and should be rewarded, even if your team (offense or bullpen) does not help you.
After all, Ian Snell cannot control the Pirates’ offense. It is not his fault that 4 of his 12 losses were “Tough Losses” and all 11 of his No-Decisions were games in which he pitched brilliantly and had an AQS, yet he received little to no offense to help garner him a ‘W’.
With that in mind, I changed W-L to the following 5 stats:
- Cheap Wins: wins in which one does not get an AQS (-1)
- Tough Losses: losses in which one does get an AQS (+2)
- Legit Wins: wins in which one does get an AQS (+2)
- Legit Losses: losses in which one does not get an AQS (-2)
- ND-AQS: no-decisions in which one gets an AQS (+1)
I received some questions for how these numbers came to be, and to keep it simple, the statistics that actually have an effect on the W-L record are valued higher (negatively and positively) than the statistics like ND-AQS, which prevent a pitcher from winning but do not hurt him with a loss.
ND-non AQS is not used here for the same reason that Cheap Wins is only negative one, which is that not every Cheap Win or ND-non AQS was a terrible start. A large bulk of them were games in which a pitcher had a good outing but only went 5 or 5.1 innings. Cheap Wins loses you a point (not two, only one) because you do not get an AQS but it does effect your win-loss record. ND-non AQS means you do not get an AQS but it does not effect your win-loss record, which is why I decided to just leave it out.
Though I am not too fond of this statistic and originally tinkered around with separately evaluating H/IP and BB/IP, using WHIP just seemed to make things easier. Though it does not tell us which pitchers walk less and give up more hits, or vice versa, or tell us how many “empty innings” a pitcher had (innings where no baserunners got on), it does provide a valid average of baserunners to expect in a given game since it does not equate to a per-9 inning scale.
- if WHIP 1.00-1.15, +3
- if WHIP 1.16-1.25, +2
- if WHIP 1.26-1.30, +1
- if WHIP 1.31-1.40, 0
- if WHIP above 1.40, -2
Instead of using K’s, I wanted to use the ratio of strikeouts to walks, since not every pitcher is a strikeout pitcher. Even so, you do not have to be a strikeout pitcher to be an accurate one, and because of this I rewarded those with high K:BB ratios. Greg Maddux only struck out 104 in 34 starts, but only walked 25 – a K:BB of 4.16. This meant that Maddux kept more runners off-base by striking them out and not walking them.
- if K:BB above 4, +7
- if K:BB above 3, +5
- if K:BB above 2, +3
- if K:BB above 1, 0
- if K:BB 1 or below, -3
EXAMPLE OF USAGE
Now that we have the points, let’s test it out and put it to use. We will use Ian Snell and Carlos Zambrano.
The table below shows Ian Snell’s 2007 numbers and points he receives for each in my points system.
When we add up all eleven of these numbers, we get Snell’s Effectiveness #, which comes to: +48.
Now, let’s look at Carlos Zambrano’s season numbers in the table below and add his point totals up.
We look at his numbers and add up the totals to get his Effectiveness #: +35.
Zambrano had more legit wins but also more legit losses, and of Zambrano’s 3 no-decisions, none were ND-AQS, whereas of Snell’s 11 no-decisions, all were ND-AQS.
That tells us that if each player got a win for every game he pitched well, and a loss for every game he did not pitch well (did not get an AQS), and the only no-decisions they received came from no-decisions that they pitched poorly in or did not go a full 6 IP, their records would look like this –
- Carlos Zambrano (18-13) would actually be 20-11
- Ian Snell (9-12) would actually be 24-8
Snell went further into his games, had a better K:BB ratio, and had that higher AQS %. It also tells us that of Snell’s 32 starts, 24 of them were of great quality, whereas Zambrano had 18 good-great starts and 16 average-bad starts.
This essentially tells us that while Zambrano’s good-great starts may have been better than Snell’s good-great starts, when Zambrano had his bad starts, Snell was still having good-great ones.
As mentioned before, I used this points system to evaluate 30 National League pitchers. I compiled a group of spreadsheets, ranking the pitchers in order in different categories to show that certain stats we rely on do a bad job of proving effectiveness.
To view all of my results, click on the links below. You can use this data in other areas, but please credit my work.
- To see the list of pitchers and their statistics used to assign points, click here.
- To see the list of pitchers in order of effectiveness points, click here.
I do not want to post a ridiculously long table on this article, so you will need to look at the linked files to see the results, but I will list the top 15 pitchers and their effectiveness points.
- Jake Peavy, +74
- Aaron Harang, +69
- John Smoltz, +69
- Brandon Webb, +67
- Cole Hamels, +65
- Brad Penny, +64
- Tim Hudson, +63
- Ted Lilly, +60
- Matt Cain, +52
- Roy Oswalt, +50
- Ian Snell, +48
- Bronson Arroyo, +47
- Derek Lowe, +47
- Greg Maddux, +45
- Adam Wainwright, +45
- Jeff Francis, +45
And, again, these points were assigned to statistics based on how important they corrolate to effectiveness. The points system essentially covers the statistics and averages from all angles.
The most shocking part of this was how low Chris Young of the Padres came out. Young went 9-8, with a 3.12 ERA, in 30 starts. He should have been more effective, I thought, based on those numbers. After looking at his game logs, though, I changed my mind and realized it made sense.
Of his 30 starts, he was essentially two different people. In the 19 starts in which he went for 6+ innings, he was 9-1 with a 1.64 ERA, averaging 6.6 IP/gm, with a 0.85 WHIP and 129 K’s in 126.1 innings.
In the other 11 starts, he was 0-7, with a 7.14 ERA, only going 4.2 IP/gm, with a 1.76 WHIP, and 38 K to his 36 BB, in 46.2 innings.
After analyzing his situation and the points system I realized that my effectiveness model favors consistency and lower standard deviations (the average of how far someone strays from his average). To me, that truly defines effectiveness.
I would much rather have a guy who I knew would amass an AQS 67% or more of the time than a guy who might strikeout 20 batters and pitch a two-hitter in one game, but give up 5 runs in 6 innings for the next three, before again pitching a brilliant game.
As long as the consistency is of a good nature, consistency in this model proves effectiveness.
I know, we’re finally at the end of the article, right? I apologize for the length but it took this long to get everything across.
Looking at Jake Peavy, the most effective NL pitcher at +74, we see that the only counted statistic in which he led was AQS. Peavy had the most good-great starts of any NL pitcher. While he may not have led in IP, IP/gm, K:BB ratio, or least losses (Brad Penny only had 1 legit loss), he led in consistency and being consistently good-great.
These results also show that Cole Hamels, with 6 more starts that he missed due to injury, would likely challenge Peavy for #1 in effectiveness – however, as my model dictates, the fact that he missed those 6 starts and Peavy did not shows that Peavy was more effective.
Yes, there were more stats we could add to this, and more variables to account for, but I feel this accurately levels the field of play between pitchers in distinctly different playing situations, and levels the difference between 2007 reputation and 2007 actual performance.
I must remind you before I come to a close, though, that this is only a measure of effectiveness, not the end-all solution to determining who the “best” pitchers are.
However, for this Sabermetrician, effectiveness directly corrolates with quality and value.