A Closer Look at Closers – Part One

Over the course of the next few weeks I will be primarily working with Closers – trying to determine the most effective ways to evaluate talent and quality at an inconsistent position that sure receives some hefty and consistent dollars.
This first part will introduce my opening step to a weighted formula to determine the value of a Closer, as well as discussing what a Closer is, and how we currently evaluate them.
Though this first part will focus solely on 2007, my study also consists of data from 2005 and 2006.
When compiling my data and examining game log after game log, I decided that my study and research should focus on some consistency, which can be hard to find for a Closer. 
I looked at the National League in 2005, 2006, and 2007, and wanted to limit my group to include only those who reached a certain criteria.  Initially I thought that anyone with 25+ saves in all three seasons should qualify.
Then, I actually saw the numbers and realized that would limit my study to include onlyJason Isringhausen, Trevor Hoffman, Billy Wagner, and Chad Cordero.
Suffice it to say, I wanted to have some more people in there.  With that in mind, I altered my criteria to simply those who actually were closers during those three seasons.  I also took into account the fact that some were demoted, promoted, or injured, and so my criteria called for 15+ saves in 2005, 2006, and 2007.
With those numbers, the nine Closers who find themselves under my statistical microscope are – Isringhausen, Hoffman, Wagner, Chad Cordero, Francisco Cordero, Brad Lidge, Jose Valverde, Brian Fuentes, and Ryan Dempster.
Yes, Francisco Cordero was in the AL for 2005 and some of 2006, however he has recorded 103 saves in the last three seasons and 60 of them were in the NL.  Plus, the whole idea of working with Closers stemmed from the idea that an inconsistent one-inning pitcher could receive a 4 yr/40 mil deal.
Simply stated, a Closer is a pitcher called on in the 8th or 9th innings, whose job is to seal the win for his team.  If he does his job he records a “Save.”  If the other team comes back to tie the game, he records a “Blown Save.” 
If you asked anyone about those stats before 1969, though, they would assume you were discussing hockey or soccer since saves are a relatively new statistic.
There are three ways a pitcher can record a save. I know this is a recap for many readers but it is important in the grand scheme of my study. The first way, which is how most people generally describe saves, involves the pitcher entering in either the 8th or 9th inning, with a lead of three or less, and preventing the other team from coming back to tie.
The second way is contingent upon when you enter the game and in what situation.  If you enter the game with the tying run on base, no matter the lead (usually extends it to a 4-run lead), and prevent the team from tying, you get a save.
The third way, which is how most middle relievers will rack up their 1-3 random saves per season, involves a pitcher going for the final three innings of the game – regardless of the score.  If the Phillies lead the Braves 9-1 and Ryan Madson pitches the 7th, 8th, and 9th, he gets a save.
If there are different types of save categories, doesn’t that mean there are different save types for each category?
Yes.  Plenty.  Think of it this way.  If you enter the 9th inning with only one out to go, and a 5-3 lead and bases empty, and you end the game, you get a save.  If you enter the 9th inning with only one out to go and the bases are loaded, and you end the game, you get a save.  One is clearly harder to do than the other and has a higher risk of resulting in a blown save, yet each ultimately results in the same statistic – a save.
With that in mind, I looked at the 9th inning and thought of all the possible situations that someone could receive a save.  In the 9th inning, there are 72 different ways to record a save, excluding what the pitcher does in the inning. 
If we count what the pitcher does, either giving up a run with a two-run lead or two runs with a three-run lead, and so forth, in the 9th inning there are 144 total ways to record a save.  I will get more into these different ways in Part Two, however the basic idea is that there are eight situations of baserunners (empty, 1st, 2nd, 3rd, 1st and 2nd, 1st and 3rd, 2nd and 3rd, bases full) and 18 different variations of these eight situations.  These variations include entering with 1 out, with 2 outs, with no outs, entering with 1-run, 2-run, or 3-run leads, and more of the same.
144 different ways can a pitcher record a save in the 9th inning, depending on how many outs he records, what the baserunning situation is, and how many runs he gives up.  Yes, this can be said for many other statistics, but Saves generally only span 2-innings MAX, and so the huge number of different types means a bit more here.
I am not dealing with “clutch” in my study.  To read some fascinating insights into relief pitching and relief clutch, read Pizza Cutter’s articles on the subject.
Instead, I am looking at what actually happens and how it happens, not the potential of why it happens.
Many people will look at two, and only two, stats when determining the quality of a closer – total saves, and percentage of successful saves (saves/save opportunities).  It has been pounded into our heads as a barometer and these statistics are supposed to inform us that the “best” closers are the ones with either the most saves or least blown saves.
What I am contending is that if there are 144 different types of 9th inning saves, and the barometer is the sum of all converted opportunities, regardless of the type of save, the needs of your team, and the situation at hand, it is impossible to equate total saves to quality.
Think of it this way – Closer A and Closer B both have 30 saves.  Closer A has 6 blown saves while B has 4 blown saves.  With those numbers, which are usually the only ones readily available, we assume that Closer B was better.  After all, he blew less saves.  What if the 4 saves he blew were all 3-run leads with bases empty and only 1 out to go in the 9th inning, though, which is the least dangerous save situation of the whole 144.  And what if the 6 blown saves of Closer A were all games he came in with runners on third base and no outs, or games where he entered in the 8th inning.
It becomes very difficult to gauge the “better” factor with just those numbers.
Regardless, even looking at hypotheticals like that, which take into account different types of saves, we cannot determine true quality because it does not take into account the needs of the teams these closers are on – which is ultimately the point of the closer.
When we discuss who the best closer is, what are we asking?  Are we wondering who was best with the most pressure?  Who posted the best numbers?  And if we are talking about numbers, what numbers are the best numbers?
These questions, and more, can cause a headache.  My point here is that we cannot compare closers to each other or determine true quality and effectiveness without analyzing what each closer did for his team.
In order to do this we need to find the number of games that each team won in a save situation (meaning no walk-off wins or 3-inning saves) and add it to the number of Blown Save-Losses because that tells us the true number of save opportunities each team had.  I call that a TSO – Team Save Opportunity.
Jose Valverde had 47 saves this year, leading the NL, however the DBacks had 64 Team Save Opportunities, whereas Ryan Dempster’s Cubs only had 48 of those games – sixteen less than the DBacks.
Dempster had 28 saves, much less than Valverde, but his conversion rate (28/31) was higher.  Valverde had more saves, but he also had more opportunities because his team played a different way and, as a team, played more close games that needed saving.  And even though Dempster’s percentage was higher, he also had less opportunities to blow saves.  If he had the 54 appearances of Valverde, he may have also blown more saves and had a worse conversion rate.
What we need to do here is level the field of play between those on teams with many save opportunities and teams with fewer.  After all, it is not Dempster’s fault that the Cubs had a better offense and blew teams out more than the DBacks.  He was not needed as often as Valverde and so his raw save and blown save totals do nothing but compare one number of Dempster’s to the overall need of Valverde and the needs of the Diamondbacks.
To really do this, the effectiveness of one pitcher to his team needs to be compared to the effectiveness of another pitcher to another team.
The DBacks had 64 TSO’s and Valverde had 54 opportunities.  This means that Valverde appeared in 54 of the 64 total save opportunities for his team, or 84.4 %.
That 84.4 % tells us he was durable since the team had so many potential save opportunities and his appearances were so high.
The Cubs only had 48 team save opportunities and Dempster only had 31 attempts.  His appearance rate would be 31 of 48, or 64.6 %.
Yes, Dempster was hurt, but this does make sense because you cannot be more effective (positive or negative) for your team if you are not involved as often as possible.  The fact that other pitchers were involved in over 1/3 of the Cubs save opportunities says that Dempster was not truly effective in making appearances.
To see the order of the nine closers in terms of Appearance Rate, look at the table below. The table shows the saves and save opportunities of the individual, as well as the total real save opportunities of the team, and then the Appearance Rate.

F. Cordero 44 51 58 87.9
Valverde 47 54 64 84.4
Hoffman 42 49 60 81.7
Wagner 34 39 48 81.3
C. Cordero 37 46 59 78.0
Isringhausen 32 34 46 73.9
Dempster 28 31 48 64.6
Lidge 19 27 55 49.1
Fuentes 20 27 59 45.8

Despite this stat being useful to tell us how durable or useful a closer can be in making appearances based on team need, it does not tell us how successful they were in actually converting these saves. Just because Valverde appeared in 54 of 64 team save opportunities for the DBacks does not mean he converted 54 saves – just that he made 54 appearances.
After careful thought, I came up with “Save Rate”, which takes the total number of saves by a closer and divides it by the total number of team opportunities.  This statistic takes the Appearance Rate to the next level.  Since closers can have a high Appearance Rate but low number of saves or low save percentage, Save Rate balances that out.
Save Rate lets us know how successful a Closer was in recording saves relative to the percentage of his team’s save opportunities.  It tells us how successful one was based on how effective he was in fulfilling his team’s need.
Essentially, it rewards those with more saves in less team opportunities, and takes away from those with less saves in more opportunities.
Valverde had 47 saves out of 54 chances, and his team had 64 real save opportunities.  His Save% would be 47/54 and his Appearance Rate would be 54/64.
His Save Rate would be 47 (# of saves)/64 (# of total team chances for a save), which comes out to 73.4 %, meaning that Valverde successfully saved 73.4 % of the DBacks team save opportunities.
Francisco Cordero of the Brewers was 2nd in the NL with 44 total saves.  He also blew seven saves giving him 51 opportunities.  His Save% was 44/51, very similar to Valverde, but his Appearance Rate was higher because the Brewers had six less team save opportunities and he only had three less appearances than Valverde.
His Appearance Rate was 51/58, or 87.9 %.  He appeared in more games proportionate to his team’s need.
His Save Rate would be 44/58, or 75.9 %, higher than Valverde’s.
To see the nine closers in order of Save Rate, look at the table below.  Again, it lists the total saves and opportunities of the individual, as well as the total team opportunities, and then the actual Save Rate.

F. Cordero 44 51 58 75.9 %
Valverde 47 54 64 73.4 %
Wagner 34 39 48 70.8 %
Hoffman 42 49 60 70.0 %
Isringhausen 32 34 46 69.6 %
C. Cordero 37 46 59 62.7 %
Dempster 28 31 48 58.3 %
Lidge 19 27 55 34.5 %
Fuentes 20 27 59 33.9 %

It makes sense that Cordero would be higher because even though his save totals and appearance totals were slightly less, he was involved in a higher percentage of his team’s chances and he converted successful saves at almost an identical number and percent.  Basically, he had less opportunities and still did the same exact thing – not the same ratio, but the same thing.
This does not necessarily mean Cordero had a better season.  This is merely one part of a two or three part article series and Save Rate is only the first part to a weighted system that should be able to determine who the best Closers are based on statistics that essentially define a good Closer.
Next week I will get into the different types of saves featured in the data sheets and discuss their importance in determining quality and effectiveness. WPA and Win Predictors will be discussed as well.
I will also look at raw numbers to help come up with the Seidman Closer Model to properly evaluate Closers.
In closing (pun very intended), I just want to add that the Closer position has become such a fickle one over the years that these evaluations need to be done on a year to year basis.  Jose Valverde was arguably one of the best NL Closers in 2007, and somewhat of a replacement, or makeshift closer, in 2005.  Brian Fuentes was dynamite in 2005 and still pretty good in 2006, yet so bad in 2007 that he lost his job.
It is remarkable how inconsistent Closers are, and that is one of the primary reasons (along with playoff success) why Mariano Rivera will go down as the greatest ever.
Lastly, of the nine closers used in this ongoing study:

  • Lidge and Fuentes were demoted in 2007
  • Lidge and Valverde were traded to new teams
  • Dempster is likely going back to the starting rotation
  • Francisco signed a huge four-year deal with a new team
  • Billy Wagner changed teams from 2005 to 2006

The only NL Closers that have actually kept their job for the same team between 2005 and 2007 are – Trevor Hoffman, Jason Isringhausen, and Chad Cordero.


2007 NL Starting Pitching Analysis

When it comes to analyzing and comparing pitchers, those conducting the comparisons will often find themselves in a tricky situation.  Sure, certain pitchers are better than others, but what are they specifically better at? 

How can we conduct an honest analysis when there are so many variables to consider?  And how can we truly determine which pitchers were better than others when some are on terrible teams with no run support and others are on tremendous teams with tons of run support?
The first step is to determine what we are measuring.  If we want to know who the best strikeout pitcher is, we should look at the raw total for strikeouts and also an average of K/IP, since some guys will make less starts than others.  To figure out who walks the least, we measure the number of walks each pitcher gives up and a walk-IP ratio.
These measurements are contingent on one category, though, and cannot tell us who is better or more effective than the rest.  All of the research and ideas presented in this article are designed to measure the “effectiveness” of a pitcher. 
In order to determine this effectiveness, a whole heck of a lot of numbers need to be measured and properly weighted/scaled so that everybody has a fair shot – whether or not they are on a great team.
I took the 1-3 best pitchers from each National League team and entered their statistics into a database, measuring everything from their raw Innings Pitched totals to their Adjusted Quality Start % (you’ll read more on that below).  After entering all of the statistics, and crunching numbers until my brain turned to mush, I came up with my weighted points system.  I assigned the corresponding point totals and added everything up to determine what I feel is a very accurate measurement of pitching effectiveness amongst the NL’s best. 
This was not applied to every single NL Pitcher in 2007 (I will do that another time) but rather amongst these 30 selected #1, #2, or #3 starters.  For instance, a guy like Jeff Suppan may have been more effective than Jason Bergmann but I wanted to have at least one person from each team.
The system is not 100% perfect and does not take into account every single statistic (do you know how many statistics there are??), but it definitely levels the playing field between those on good or bad teams, those injured/called up or just plain bad, and those who got lucky or unlucky with run support.  The points are assigned based on the areas I, as an intense student of the game, feel are most important to determine true effectiveness. 
The basic idea of this system is to measure the true quality of a pitcher over his season – IE, what would happen if a pitcher was rewarded every time he pitched well and discredited every time he pitched poorly – something that happens perfectly just about 0% of the time. 
We will begin by going over the statistics involved, what their points scale was, and why they are used.  The idea behind these corresponding point totals is to properly weight the areas in which most people intuitively attribute to success and quality.
The points given to each statistical subset are designed to separate the aces from the workhorses and the workhorses from the seemingly replacement level pitchers.  They may seem arbitrary and could be replaced with different numbers, or fractions/decimals, however the difference between the points in subsets was based on the amount of pitchers who fall into certain categories.
In order to be as effective as possible, a pitcher needs to make as many starts as he can.  How can we say that a pitcher with 14 starts is more effective than one with 34-35, even if his numbers in those 14 starts are tremendous and the numbers of the one with 34-35 are a bit worse?  His numbers may be better than the pitcher with 35 starts, however the latter pitcher was involved in 21 more games and proved to be durable enough to pitch an entire season, and solid enough to maintain his SP status for 162 games. 
This does not mean that a pitcher with 35 starts is necessarily “better” than one with 14-16, but rather he is more effective because he is involved in more of his team’s season. 
If the pitcher with 14-16 starts posted the same numbers in 32 starts, it would not be a contest.  But, he didn’t – it was only 14-16.  You cannot have as much of an effect on your team (actual play, not motivational or anything) unless you are out there as often as possible.
***What the end result of this effectiveness points system showed is that those with average numbers, over 30+ starts, were equally as effective, or slightly better/worse, than those with good numbers over 16-20 starts.***
If somebody makes only 14 starts in a season, it could be because he was injured for half of the season or was called up from the minors during the season, so he should not be penalized with negative points for that – he just should not be rewarded as highly as someone with 30+ starts.

  • if over 30 starts, +5
  • if 25-29 starts, +3
  • if 20-24 starts, +2
  • if under 20 starts, 0

Just like Games Started, IP can only get you positive numbers, because the low raw number of IP can be attributed to injury or a midseason call-up.  Those with more IP get higher point totals, though.  The reason for 0 points for under 100 innings is because you were not necessarily a bad pitcher, but the lack of innings (whether due to injury or a call-up) limits the effectiveness.

  • if 230+, +8
  • if 220-229, +7
  • if 200-219, +5
  • if 150-199, +3
  • if 100-149, +2
  • if under 100, +1

This is where negative numbers can begin.  If you were hurt, or called up from the minors, you are not penalized with negatives for the raw number of innings pitched or games started, but if you posted a high number of starts and low number of innings, this statistic will bite you in the rear.  IP/Game separates the hurt or called up from the downright below average or bad.  It also helps reward those with a couple less starts than others but with more raw innings pitched.  These types of pitchers were in the same GS range but some went deeper into games than others.  Nobody averaged over 7 IP/gm, so we start lower.

  • if 6.5-7 IP/gm, +7
  • if 6.0-6.49 IP/gm, +5
  • if 5.5-6 IP/gm, +3
  • if 5.0-5.5 IP/gm, 0
  • if below 5.0 IP/gm, -5

If you cannot average over 5 innings per game, or exactly 5 innings per game, you should not be a starting pitcher.  Even Adam Eaton averaged over 5 IP/gm in 2007.
Quality Starts can be an inaccurate statistic because it takes into account games in which a pitcher goes 6+ innings and gives up no more than 3 earned runs… and nothing else.
If a pitcher goes 8.1 innings and gives up 4 runs, it is arguably the same ratio and an equal game in terms of quality, but does not get counted as a quality start.
With that in mind, I came up with the stat of Adjusted Quality Starts, which takes into account all regular quality starts as well as games in which someone goes 7.2-9 innings and gives up no more than 4 runs.  This measures the true number of games in which a pitcher had a good-great performance.
***If you wonder why it is 7.2 IP, instead of 8, the number was derived from the amount of times a pitcher was lifted after 7.2 IP for a specialist, or other sort of reliever, and from the sheer low average of innings pitched/game by a starter this year.  Reaching the 7th inning is now a great feat, let alone coming within one out of finishing the 8th.  Though the previous ratio for a QS was 2:1, due to the data mentioned above, going an extra 1.2 IP to get to 7.2 IP merits being able to give up one more run.***
I used the percentage of AQS to the total number of Games Started to measure effectiveness in this area.  Someone over 75% almost always pitches a good-great game, whereas someone under 50% only pitches a good game less than half of the time – not very effective.

  • if AQS % is above 75%, +5
  • if AQS % is 67-74%, +3
  • if AQS % is 50-66%, 0
  • if AQS % is below 50%, -3

If you’re keeping score at home, AQS= 6+IP with ER =< 3, AND, 7.2+IP with ER =< 4, where =< is the blog version of greater than/less than or equal to. 
In addition to AQS, something that needs to be taken into account is how often a pitcher went for a complete game, since they are so rare.  We also need to take into account a shutout, since they occur even less. 

  • For every CG, +2
  • For every SHO, additional +1

***NOTE: Aaron Harang had two games in 2007, one where he went 9 IP, and one where he went 10 IP, when he did not get a decision.  Even so, I am counting these 2 as a combined 1 CG, since he went 9+ innings.***
W-L Records are the most deceiving statistics because they do not take into account the true quality of the games pitched.  Just because a pitcher goes 14-7 does not mean he was necessarily a great pitcher.  He could have pitched terribly and had great run support in 10 of 14 wins, but brilliantly with terrible run support in the 7 losses.
The whole point of the adjusted W-L records is to get an AQS, since that means you pitched well and should be rewarded, even if your team (offense or bullpen) does not help you. 
After all, Ian Snell cannot control the Pirates’ offense.  It is not his fault that 4 of his 12 losses were “Tough Losses” and all 11 of his No-Decisions were games in which he pitched brilliantly and had an AQS, yet he received little to no offense to help garner him a ‘W’.
With that in mind, I changed W-L to the following 5 stats:

  • Cheap Wins: wins in which one does not get an AQS (-1)
  • Tough Losses: losses in which one does get an AQS (+2)
  • Legit Wins: wins in which one does get an AQS (+2)
  • Legit Losses: losses in which one does not get an AQS (-2)
  • ND-AQS: no-decisions in which one gets an AQS (+1)

I received some questions for how these numbers came to be, and to keep it simple, the statistics that actually have an effect on the W-L record are valued higher (negatively and positively) than the statistics like ND-AQS, which prevent a pitcher from winning but do not hurt him with a loss.
ND-non AQS is not used here for the same reason that Cheap Wins is only negative one, which is that not every Cheap Win or ND-non AQS was a terrible start.  A large bulk of them were games in which a pitcher had a good outing but only went 5 or 5.1 innings.   Cheap Wins loses you a point (not two, only one) because you do not get an AQS but it does effect your win-loss record.  ND-non AQS means you do not get an AQS but it does not effect your win-loss record, which is why I decided to just leave it out.
Though I am not too fond of this statistic and originally tinkered around with separately evaluating H/IP and BB/IP, using WHIP just seemed to make things easier.  Though it does not tell us which pitchers walk less and give up more hits, or vice versa, or tell us how many “empty innings” a pitcher had (innings where no baserunners got on), it does provide a valid average of baserunners to expect in a given game since it does not equate to a per-9 inning scale.

  • if WHIP 1.00-1.15, +3
  • if WHIP 1.16-1.25, +2
  • if WHIP 1.26-1.30, +1
  • if WHIP 1.31-1.40, 0
  • if WHIP above 1.40, -2

Instead of using K’s, I wanted to use the ratio of strikeouts to walks, since not every pitcher is a strikeout pitcher.  Even so, you do not have to be a strikeout pitcher to be an accurate one, and because of this I rewarded those with high K:BB ratios.  Greg Maddux only struck out 104 in 34 starts, but only walked 25 – a K:BB of 4.16.  This meant that Maddux kept more runners off-base by striking them out and not walking them.

  • if K:BB above 4, +7
  • if K:BB above 3, +5
  • if K:BB above 2, +3
  • if K:BB above 1, 0
  • if K:BB 1 or below, -3

Now that we have the points, let’s test it out and put it to use.  We will use Ian Snell and Carlos Zambrano.
The table below shows Ian Snell’s 2007 numbers and points he receives for each in my points system.

Starts 32 +5
Innings 208.0 +5
Cheap W 0 0
Tough L 4 +8
Legit W 9 +18
Legit L 8 -16
ND-AQS 11 +11
AQS % 75% +5
IP/Game 6.52 +7
WHIP 1.33 0
K:BB 2.60 +3
CG 1 +2
SHO 0 0

When we add up all eleven of these numbers, we get Snell’s Effectiveness #, which comes to: +48.
Now, let’s look at Carlos Zambrano’s season numbers in the table below and add his point totals up.

Starts 34 +5
Innings 216.1 +5
Cheap W 0 0
Tough L 2 +4
Legit W 18 +36
Legit L 11 -22
ND-AQS 0 0
AQS % 53% 0
IP/Game 6.36 +5
WHIP 1.34 0
K:BB 1.75 0
CG 1 +2
SHO 0 0

We look at his numbers and add up the totals to get his Effectiveness #: +35.
Zambrano had more legit wins but also more legit losses, and of Zambrano’s 3 no-decisions, none were ND-AQS, whereas of Snell’s 11 no-decisions, all were ND-AQS. 
That tells us that if each player got a win for every game he pitched well, and a loss for every game he did not pitch well (did not get an AQS), and the only no-decisions they received came from no-decisions that they pitched poorly in or did not go a full 6 IP, their records would look like this –

  • Carlos Zambrano (18-13) would actually be 20-11
  • Ian Snell (9-12) would actually be 24-8

Snell went further into his games, had a better K:BB ratio, and had that higher AQS %.  It also tells us that of Snell’s 32 starts, 24 of them were of great quality, whereas Zambrano had 18 good-great starts and 16 average-bad starts.
This essentially tells us that while Zambrano’s good-great starts may have been better than Snell’s good-great starts, when Zambrano had his bad starts, Snell was still having good-great ones.
As mentioned before, I used this points system to evaluate 30 National League pitchers.  I compiled a group of spreadsheets, ranking the pitchers in order in different categories to show that certain stats we rely on do a bad job of proving effectiveness.
To view all of my results, click on the links below.  You can use this data in other areas, but please credit my work.

  • To see the list of pitchers and their statistics used to assign points, click here.
  • To see the list of pitchers in order of effectiveness points, click here.

I do not want to post a ridiculously long table on this article, so you will need to look at the linked files to see the results, but I will list the top 15 pitchers and their effectiveness points.

  1. Jake Peavy, +74
  2. Aaron Harang, +69
  3. John Smoltz, +69
  4. Brandon Webb, +67
  5. Cole Hamels, +65
  6. Brad Penny, +64
  7. Tim Hudson, +63
  8. Ted Lilly, +60
  9. Matt Cain, +52
  10. Roy Oswalt, +50
  11. Ian Snell, +48
  12. Bronson Arroyo, +47
  13. Derek Lowe, +47
  14. Greg Maddux, +45
  15. Adam Wainwright, +45
  16. Jeff Francis, +45

And, again, these points were assigned to statistics based on how important they corrolate to effectiveness.  The points system essentially covers the statistics and averages from all angles.
The most shocking part of this was how low Chris Young of the Padres came out.  Young went 9-8, with a 3.12 ERA, in 30 starts.  He should have been more effective, I thought, based on those numbers.  After looking at his game logs, though, I changed my mind and realized it made sense.
Of his 30 starts, he was essentially two different people.  In the 19 starts in which he went for 6+ innings, he was 9-1 with a 1.64 ERA, averaging 6.6 IP/gm, with a 0.85 WHIP and 129 K’s in 126.1 innings.
In the other 11 starts, he was 0-7, with a 7.14 ERA, only going 4.2 IP/gm, with a 1.76 WHIP, and 38 K to his 36 BB, in 46.2 innings.
After analyzing his situation and the points system I realized that my effectiveness model favors consistency and lower standard deviations (the average of how far someone strays from his average).  To me, that truly defines effectiveness.
I would much rather have a guy who I knew would amass an AQS 67% or more of the time than a guy who might strikeout 20 batters and pitch a two-hitter in one game, but give up 5 runs in 6 innings for the next three, before again pitching a brilliant game.
As long as the consistency is of a good nature, consistency in this model proves effectiveness.
I know, we’re finally at the end of the article, right?  I apologize for the length but it took this long to get everything across. 
Looking at Jake Peavy, the most effective NL pitcher at +74, we see that the only counted statistic in which he led was AQS.  Peavy had the most good-great starts of any NL pitcher.  While he may not have led in IP, IP/gm, K:BB ratio, or least losses (Brad Penny only had 1 legit loss), he led in consistency and being consistently good-great.
These results also show that Cole Hamels, with 6 more starts that he missed due to injury, would likely challenge Peavy for #1 in effectiveness – however, as my model dictates, the fact that he missed those 6 starts and Peavy did not shows that Peavy was more effective.
Yes, there were more stats we could add to this, and more variables to account for, but I feel this accurately levels the field of play between pitchers in distinctly different playing situations, and levels the difference between 2007 reputation and 2007 actual performance.
I must remind you before I come to a close, though, that this is only a measure of effectiveness, not the end-all solution to determining who the “best” pitchers are.
However, for this Sabermetrician, effectiveness directly corrolates with quality and value.