This Week in News and Sabermetrics: 4/6-4/12

Welcome to the first edition of TWINS – This Week in News and Sabermetrics.  This will be a weekly article recapping the goings on in the baseball world, ranging anywhere from top games of the week or oddest stats to frontrunners for awards based on my formulas and links to great articles.  Expect one of these bad boys every Saturday.  If anybody has suggestions for additions they would like to see feel free to post them in the comments.  Without further delay:
Interesting Bits of Tid
Well, the Tigers finally won a game after starting the season 0-7 and worrying the moustache off of Jim Leyland (not in a literal sense).  Unfortunately, any hope of a winning streak was put to rest when Tim Wakefield took the mound the next night.  Two weeks into the season the team expected to score 1,000 runs in 162 games (6.17/gm for anyone wondering) has scored 28 runs in 9 games (not 6.17/gm for anyone wondering).  To show how bad things have been Placido Polanco even made errors in consecutive nights.
Staying in the AL, Travis Buck of the Athletics started the season by going 0-21, with 9 strikeouts, and a .043 OPS… out of the leadoff spot.  He was about as effective as Travis Buckley–the other guy that appears when you type “Travis Buck” into Baseball Reference–but then remembered how to hit.  In his next three games Buck went 7-16, with 6 doubles, 4 RBIs, and a 1.284 OPS.
MVP Predictor
I came up with a pretty simple formula to see who would win the MVP should the season end at any given point.  The formula is: (OPS+ / 2) + VORP + VB.
OPS+ compares production to the rest of the league; VORP offers how important a player proved to be in accounting for runs than a replacement level player; VB is a Victory Bonus, just like in the James Cy Young Predictor, that awards points to a division leader.  In this case, +10 for first place and +6 for second place.  It’s simple but effective in determining how important a player statistically performed.  It does not take into account the more human factors of the game but the MVP is usually awarded to the best hitter on the best team; this formula measures that. 
I will be revising this throughout the season I am sure but for now it will work fine.  Here are the top five in the NL:

  1. Kendall, MIL, 135.7
  2. Ramirez, FLA, 130.9
  3. Burrell, Phi, 123.5
  4. Pujols, StL, 120.9
  5. Upton, Ari, 107.2

And the AL:

  1. Pierzynski, CHW, 131.4
  2. Scott, BAL, 126.7
  3. Crede, CHW, 123.2
  4. Dye, CHW, 120.0
  5. Drew, BOS, 114.8

Cy Young Predictor
In The Neyer/James Guide to Pitchers Bill James presented a formula that could, with pretty good accuracy, predict the eventual Cy Young Award.  For a description of the formula, click here.  Though I altered his formula in previous articles to account for old-time players, his works great here.  Here are the top five in the NL:

  1. Jake Peavy, SD, 23.3
  2. Brandon Webb, ARI, 19.6
  3. Micah Owings, ARI, 18.8
  4. Ben Sheets, MIL, 18.6
  5. Jason Isringhausen, StL, 18.4

And the AL:

  1. Daisuke Matsuzaka, BOS, 23.2
  2. Zach Greinke, KC, 22.2
  3. Edwin Jackson, TB, 22.0
  4. Chien-Ming Wang, NYY, 20.3
  5. Brian Bannister, KC, 19.9

Beane Count
Over at Rob Neyer created a really cool stat I had never heard of until earlier this month, titled Beane Count.  The stat measures all of the contributions Athletics GM Billy Beane looks for in players and evaluates the teams that best fit his desires.  The total is found by adding the team rank in home runs hit, walks, home runs allowed, and walks allowed.  Interestingly enough, as of right now, both the Chicago White Sox and Chicago Cubs lead their respective leagues–and by significant margins.
Cain Watch
Many readers here should know that I have some crazy manlove for Matt Cain, despite having no allegiances to the Giants, and really cannot stand how unlucky he gets on the mound.  In 2007 he went 7-16, though my Adjusted W-L system had him pegged at 16-7; my SP Effectiveness System even scored him a +50, just meeting the cutoff for a #1 pitcher.  Each week I will look at his starts and see if the unlucky trend continues.

  • #1, 4/1/08, 5.2 IP, 3 H, 0 R, 0 ER, 4 BB, 5 K, ND.  Records an AQND because it was an Adjusted Quality Start.  Game Score of 64.  From what I saw and heard he was squeezed and really should have only walked two batters.
  • #2, 4/7/08, 4.1 IP, 7 H, 5 R, 4 ER, 5 BB, 5 K. Loss.  Does not record an AQS and legitimately deserved to lose.  Unlike his first start he was not terribly squeezed and this was not a good start by any means.

Game Scores of the Week
Bill James created the Game Score statistic to measure the exact quality of a pitched game.  Info on the easy to calculate figure can be found here.  For the record, a GSC of 50 or higher is good.  Below are the top three game scores of the week of 4/6-4/12.

  • Ben Sheets, April 6th: 9 IP, 5 H, 0 R, 0 ER, 0 BB, 8 K – 85 GSC
  • Edwin Jackson, April 10th: 8 IP, 2 H, 0 R, 0 ER, 4 BB, 6 K – 80 GSC
  • Wandy Rodriguez, April 7th: 7.1 IP, 3 H, 0 R, 0 ER, 0 BB, 6 K – 78 GSC

Weekly Oddibe Award
The Oddibe Awards are given to the hitter with the slash stats (BA/OBP/SLG) closest to the league average and are named after Oddibe McDowell, whom RJ Anderson of Beyond the Box Score determined to have the career slash line closest to the league average from 1960-2006.  As of this week the league average slash line is .257/.327/.403.  Should the season for some odd reason end today, the 2008 Oddibe Award recipient would be – Orlando Hudson, Ari: .270/.325/.405.
If the Season Ended Today
Speaking of whether or not the season ended today I think it will be interesting to look at the playoff matchups each week if it did end.  This way we can see which teams were in it all year as opposed to burning out or surging in. Note – this was done at 11:16 PM EST, so the As had played while the Angels were still playing.

  • Baltimore Orioles (AL East) vs. Chicago White Sox (Wild Card)
  • Kansas City Royals (AL Central) vs. Oakland Athletics (AL West)
  • Arizona DBacks (NL West) vs. Winner of Tiebreaking Game between CHC/MIL
  • Florida Marlins (NL East) vs. St. Louis Cardinals (NL Central)

In Case You Missed It
Here are some great sabermetrics articles from this past week:


The Santana Hypocrisy

Before getting into the article I wanted to mention that my personal website, is now back up and running. The site holds information for all of my endeavors, including sabermetrics, magic, and my professional screenwriting.
DISCLAIMER: This will not truly be a statistical piece but rather more along the lines of psychology and opinion. And yes – the title sounds like a Matt Damon movie title.
I was watching Freaks and Geeks the other day and an incident in the episode sparked a metaphor in my mind. In the show, Sam really liked Cindy Sanders, a girl who was dating a jock and only wanted to be his friend. At dinner Sam told his mother about Cindy’s lack of interest. His mother, trying to keep her son optimistic, told him she was making a mistake/dumb decision and that it would be “her loss.”
I wondered, though, would Sam’s mother have been as “down” on Cindy if Sam came home with news that Cindy did like him?
As in, is it okay to “diss” or find flaws in something not yours if you would be ecstatic if said thing was yours?
Even though I would love to continue talking about one of my favorite television shows the purpose of this post is to direct the above question towards the recent trade of Johan Santana.
Unequivocally, I am a die-hard Phillies fan. Though I seem to adopting the Rays as a second team the Phillies are the sole owners of the baseball-area in my heart. Even though they are my favorite team, and the Mets are in their division, I am really excited about the Johan trade.
Yes, a Phillies fan excited that the Mets improved their team.
Johan has been a favorite of mine since 2002 when, via the MLB digital cable package, I watched him routinely make relief appearances. I always noted how “cool” or “funky” his windup and delivery were and loved watching him on the mound. He has also been the only non-Greg Maddux player that I like to exclusively follow.
Now he is in the same division as the team I root for and I cannot wait to see these games. I cannot wait to see a Hamels/Santana battle of the changeups, or Santana facing off against Jimmy Rollins in the 8th inning of a (hopefully) meaningful September game. I am greatly anticipating a Santana/Peavy Sunday Night Baseball matchup or even just simply watching the guy bat!
Unfortunately, I am mostly alone in my thoughts when it comes to non-NYM NL East fans. You see, a stark contrast exists between the definitions of “die-hard fans” and that is the main reason I am mostly alone in my thoughts. There are fans whose personal lives are so effected by sports that it borders on sick obsession, and there are fans like me, fans who give so much of their heart and mind to the game but can continue their regular lives when the game ends.
I am a die-hard Phillies fan but, when the Mets landed Johan, I did not cry, pop pills, seek therapy, or curse on message boards. I grinned. I grinned as if to say – “Oh, you rascal Metropolitans!” I grinned because this is going to be a very exciting season.
In an initial reactive conversation with my brother Corey, though, he caught me doing the same thing I had been complaining about to him – falling into The Santana Hypocrisy.
I made a comment to him along the lines of – “I mean, honestly, how many games is he going to personally improve?”
Corey called me on it and I admitted fault. After all, this is such an easy hypocrisy to fall victim to but it becomes a problem when fans become so entrenched in it that they lose touch with reality.
DISCLAIMER 2: This is not means to bash any fan of any team, so Mets, Phillies, Braves, and Twins fans, please do not scream down my throat. I am merely investigating the human nature and seemingly programmed response that falls into this hypocrisy.
I have read a plethora of reactions on this trade and, while most are valid or provide some semblance of a reasonal response, some are ridiculous in their hypocritical nature. The hypocrisy does not stem from the reactions, themselves, but rather the fact that these reactions would be completely reversed if the circumstances were different (IE – if Santana was on their team).
The reactions to this trade seem to come in three forms – excited, disappointed, and angered. You’ll never guess which bunch are excited.
The disappointed department houses some Twins fans, Phillies fans, Braves fans, Manny Acta and Felipe Lopez, 12 of the 32 Marlins fans, some Red Sox/Yankees fans, and one Royals fan (Joe Posnanski). The angered department holds the rest of the Twins fans and some very opinionated Phillies and Braves fans.
Some of those in the angered department have lost some sense of reality. I have read so many posts that point out flaw after flaw after flaw about Johan, be it his home run total of last year, his decline in W-L record (useless stat), his high ERA (yeah, 3.33 in the AL is ridiculously high, right?), his potential arm troubles, how “overrated” he is, or anything else along those lines. These fans are finding everything they can to serve the dual roles of –

  • Raining on the parade of Mets fans
  • Making themselves feel better about not acquiring Johan

There is no way in hell these fans would search for these flaws if their teams landed Santana. If the Twins signed Johan to an extension he would have a great year and would be applauded for staying. If the Phillies got him then it would seem very likely that a team with the NL’s best offense, the MLB’s best pitcher, and arguably the best young pitcher would perform VERY well. If the Braves were able to line him up alongside Smoltz and Hudson, something tells me that his “flaws” would be forgotten more quickly than Mark Lemke’s pitching career.
Why do we all allow ourselves to criticize someone we would shower with love if in our presence? It is jealousy? Fear? Ignorance? Probably all three.
Johan is the best pitcher in baseball and makes a significant difference on any team he plays for. He did not single-handedly will the Twins to the playoffs during his tenure there but I would love to see how many of those Twins teams would have made the playoffs without his services.
To not acknowledge the difference he makes is to be an ignorant baseball fan.
To go as far as to say he is not that great, has a ton of flaws, or is overrated is to be a fan completely detached from reality. I can guarantee that every other pitcher on the teams that these fans root for has many more flaws than Johan.
There are reasons this guy has finished either #1 or in the top five in Wins, ERA, ERA+, WHIP, K, K:BB, SHO, GS, and IP over the last four years. The primary of those reasons is that he is extremely dominant and talented. In my SP Effectiveness System, where you need a +50 or higher to be considered a #1 SP, Johan has averaged a +71.3 in in the last four years. That is clearly the most from 2004-2007 and the only four-year spans since 2000 that were higher were the 2000-2003 seasons of Curt Schilling and Randy Johnson, both of whom are at the end of their careers now.
He has made 134 starts since 2004, and 97 of them have been AQS, which is 72 %, more than anyone else in that span.
Looking even further, if we want to use W-L records as a barometer, we are going to use my Adjusted W-L. Johan has gone a recorded 70-32 in the last four years (an average of 18-8 per season), but by my calculations, his Adjusted W-L would be 78-24 (an average of 20-6 per season).
I have no problem with people being upset that Johan now plays for the Mets. I have no problem with people not personally liking Johan Santana. I have no problem with people not personally liking the Mets (hey, I don’t like them!). I also have no problem with fans questioning the opinions of other fans.
I do, however, have a problem with no middle ground of opinion existing.
It seems that Mets fans believe they have already won the world series and, based on numerous message boards I have read, Phillies and Braves fans think Johan stinks. The Mets fans overexaggerate and the other fans have to do the polar opposite to compensate. There are very few people, relative to those who express opinions, who can be fans of other teams effected by the trade and be able to acknowledge that the Mets did something positive by gaining a great player. It’s either Johan is the messiah or Johan is overrated.
If a player, who when on your team, would increase a bulge in your pants worthy of Ron Burgundy’s thumbs-up, there is absolutely no justifiable reason to legitimately criticize said player and point out his flaws just because he is on another team. It is equivalent to really wanting a toy truck and, when you find out you can’t have it, calling that truck stupid or pretending like you don’t want it. In other words, it’s very childish.

Adjusted W-L: A Study of the Unlucky

If you have read any of my work on Starting Pitchers and SP Effectiveness it will come as no surprise that I strongly dislike Win-Loss records. 
In the 2005 season, Johan Santana posted the following numbers-

  • 16-7 actual W-L
  • 2.87 ERA
  • 7.02 IP/gm
  • 231.2 IP
  • 0.97 WHIP
  • 5.29 K:BB
  • 3 CG/2 SHO
  • 33 Games Started

In 2005, Bartolo Colon won the AL Cy Young Award.  Any idea of how many of the above categories, which we all intuitively equate to pitching effectiveness, Colon outranked Santana in? 
One.  One category.  Colon beat Santana in only one category in 2005.  Care to venture a guess to which it was?  Combine my sarcastic tone with the title/first line of this article if you need help.  That’s right.  The one category he outperformed Santana in was WINS, 21-16.  Santana outperformed Colon in every other statistical category in 2005 and somehow lost the Cy Young.  Not to take anything away from Colon’s season but he clearly did not perform better than Santana in any category other than wins and they had the same number of starts.  And to say that the Angels made the playoffs strictly because of Colon is just slightly over borderline ridiculous. 
For reasons unbeknownst to me, W-L has become an extremely significant barometer when measuring the quality of a season and of a career.  We invest a ton of stock into a statistic that paints us half of a whole portrait.  Ask yourself this – what does a W-L record tell us?
Does it provide a ratio of how often someone pitched well to how often he didn’t?  No, because a Win does not always equate to a well-pitched game and a loss does not always equate to a poorly-pitched game.
Does it take into account the fact that some teams score more than others?  No, because you get credited with a win if you last at least five innings and your team never relinquishes the lead once you leave.  It does not matter if you give up six runs in seven innings as long as you meet that above criteria.
A few weeks back I introduced my statistic, AQS – Adjusted Quality Start, which refers to when a pitcher either goes 6+ IP while surrendering 3 or less earned runs or 7.2+ IP while surrendering no more than 4 earned runs.  Using the AQS allows us to find the ratio, mentioned in the question above, of how often a pitcher performed well in comparison to not performing well.  Regardless of whether or not you received the deserved decision, or whether or not you even received a decision, if you meet the criteria of an AQS it means you pitched well and, in theory, deserve to win.
Springboarding off of the AQS, I began to separate W-L records into what they really were – a combination of Cheap Wins, Tough Losses, Legitimate Wins, and Legitimate Losses.  The legitimate decisions refer to games that a pitcher either recorded an AQS, and won, or did not record an AQS and lost.  The reverse can be said for the Cheap Wins/Tough Losses.  Failing to record an AQS and getting a win really should not happen and the same can be said for garnering a loss while recording an AQS.
I will use the 2007 season of John Smoltz to put this to use.  By all accounts he had a great year but he often gets lost in the Peavy/Webb shuffle when discussing the best in the NL this past season.  Peavy won 19 games, Webb won 18, and Smoltz only won 14.  Something deep down tells us that Smoltz had a better season than his 14-8 record would indicate, but how much better?
Looking more closely at his 14-8, we see that he had 0 Cheap Wins, 5 Tough Losses, 14 Legit Wins, and 3 Legit Losses.
If we take the Cheapies and Toughies out, Smoltz is left with a 14-3 record of legitimate decisions.  I want to go a bit further, though, because he recorded 22 decisions no matter how we look at it.  He legitimately deserved to go 14-3, but there were five games he lost that he pitched well enough to win.
With that in mind, I began to adjust the W-L records of pitchers and see what would happen if they were credited with a Win for every Tough Loss and a Loss for every Cheap Win, on top of the Legit Wins and Legit Losses.
When we apply that to Smoltz, his 2007 Adjusted W-L would be 19-3.  When we do the same to Peavy and Webb we get a 21-4 record for Peavy and a 20-8 record for Webb.
Essentially, Smoltz should have won 19 of his 22 decisions, Peavy should have won 21 of his 25 decisions, and Webb should have won 20 of his 28 decisions. 
If we are going to use W-L record as a barometer of quality, then we should use this Adjusted W-L instead since it actually does give us the ratio of how many times a pitcher performed well relative to the decisions he received.
Below is a table featuring the Actual W-L records and the Adjusted W-L records of some NL pitchers from 2007.





Jake Peavy




John Smoltz




Cole Hamels




Brad Penny




Tim Hudson




Ted Lilly




Matt Cain




Ian Snell




Dontrelle Willis




Adam Eaton




As we can see, Brad Penny had the best Adjusted W-L of any NL pitcher as he truly deserved to lose only one of his decisions.  If he received proper run support and was a bit luckier in the games he recorded decisions, he would have posted a 19-1 record.  I wonder if it would have been a different Cy Young picture if he did. 
Look at the cases of Matt Cain, Dontrelle Willis, and Adam Eaton.  Cain finished the season with an actual W-L of 7-16, even though he deserved to go 16-7.  That means he was unlucky nine times.  Dontrelle Willis should have been 15-10 even though he ended up 10-15, meaning he was unlucky five times.  Yes, by all accounts Dontrelle had a down season, but he did really deserve to win 15 of his decisions. It was just how bad his 10 deserved losses and no-decisions were that turned his season upside down.
On the flip-side, Adam Eaton finished the season 10-10, even though he deserved to be 6-14.  While Cain and Willis were very unlucky, Eaton turned out to be lucky four times.
When we look at the number of Cheap Wins and Tough Losses, we can subtract the difference, express it as a + or – number and detail which pitchers were the luckiest and unluckiest.  This is a bit different than the Pythagorean Formulas used to determine what a team’s record should be.  The team formulas look at the season, as a whole, and provide estimates as to what an overall record should be based on how many overall runs are scored and given up.
It does not make sense to use that here, because if a pitcher gives up 10 runs in Game 1, and 1 Run in Game 2, the average would come out to two bad starts, even though the starts are completely separate and the damage was done in one game.  The team formulas evaluate the entire forest without looking at each individual tree.
Looking at each individual tree needs to be done to really show which pitchers were luckiest and unluckiest.
In the case of Cain, he had 0 Cheap Wins and 9 Tough Losses.  Net Luck = 0 – 9, meaning that Cain had a Net Luck Rating of -9, or in other words was very unlucky.  There were no recorded Wins that he should have lost but there were nine recorded losses he should have won, or at least not recorded a loss.
Adam Eaton had 5 Cheap Wins and 1 Tough Loss.  5 – 1 = 4.  Eaton’s Net Luck was +4, meaning he was lucky four times.  Positive numbers correspond to being lucky, negative numbers correspond to being unlucky, and 0 corresponds to receiving exactly what you should have received.
Aaron Harang was 16-6 with 0 Cheap Wins and 0 Tough Losses.  He had a great season and deserved to go 16-6 in his decisions.  He would have a Net Luck Rating of 0, since he was not lucky or unlucky.
When pitchers tie in either luck or lack of luck the statistic we should look to is AQS %, which refers to the percentage of times a pitcher recorded an AQS.  With lucky pitchers, a lower AQS % tells us they pitched well less, and so they are luckiest because they recorded the most amount of Net Luck while pitching well the least amount of time.  For unlucky pitchers we look at the highest percentage because it tells us that the pitcher was not only unlucky enough to lose games he should have won but that he also pitched well a higher percentage of times.
For instance, Scott Olsen, Adam Eaton, and Byung-Hyun Kim all tied with a +4 Net Luck Rating, meaning they were the luckiest NL pitchers.  Olsen had an AQS % of 33.3, Kim at 27.3, and Eaton at 26.7.  Therefore, Adam Eaton was the luckiest NL pitcher because he received four positive decisions that were unmerited and pitched well the least amount of time.
Though Cain, Bronson Arroyo, and Derek Lowe all ranked higher than Dontrelle and Smoltz, the latter two finished at -5.  Dontrelle had an AQS % of 57.1 and Smoltz at 84.4 %.  Therefore, Smoltz was unluckier than Willis because he received five negative decisions that were unmerited and pitched well way more often.
When we apply Net Luck to every pitcher in 2007, in both the NL and AL, we get the following results –

  • Luckiest NL SP = Adam Eaton (PHI), +4
  • Luckiest AL SP = Odalis Perez (KC), +4
  • Unluckiest NL SP = Matt Cain (SF), -9
  • Unluckiest AL SP = Dan Haren (OAK), -6

Though Haren pitched well and still finished 15-9, he should have been 21-3.  Odalis Perez actually tied Felix Hernandez of the Mariners at +4, but Hernandez’ AQS % was 57.1 whereas Perez came in at 30.8.
Honorable Mentions for Luck in 2007 go to:

  • Scott Olsen, +4
  • Byung-Hyun Kim, +4
  • Paul Byrd, +3
  • Boof Bonser, +3
  • Jeremy Bonderman, +3

Honorable Unlucky Mentions in 2007 go to:

  • Bronson Arroyo, -7
  • Derek Lowe, -6
  • John Smoltz, -5
  • Mark Buehrle, -5
  • Gil Meche, -5
  • Dontrelle Willis, -5

Though I do not have all of the data compiled right now, something I am going to investigate over the next few weeks are which pitchers, from 2000-2007, have been the luckiest and unluckiest.
Another usage of Net Luck that fascinates me, and that I am currently researching for my book, involves an application to 300 game-winners, as well as those who are close.  Something tells me that I will find some guys with 300 wins who maybe should not have 300 wins, as well as some guys who are short of 300 that really should have it.  After all, if we are going to use 300 wins as a Hall of Fame barometer, we should at least make sure the wins are deserved.
I am currently involved in conducting this research and if anyone would like to help, please get in touch with me.

2007 American League SP Analysis

A couple of weeks ago, I presented the Seidman SP-Effectiveness Model, which took into account a large majority of statistics that deem a pitcher to be effective and weighted them with points based on how important/rare they were.  The system is designed to take into account various factors that need to be taken into account in order to level the field of play between those on good or bad teams, those with or without run support, and those either called up/injured or those just plain bad.
Not surprisingly at all, Jake Peavy ended up being first, five points ahead of his competition, but the order of those that followed him turned out to be a bit more surprising than I thought.  Everything made proper sense, though, because the pitcher cannot be blamed for his team not scoring for him or not getting decisions in brilliantly-pitched games.
Essentially, my SP-Effectiveness Model answers the question – What would happen if a pitcher was rewarded every time he pitched well and negated every time he pitched poorly?
I also introduced my statistic, the AQS, or Adjusted Quality Start, which extends the general rule of 6+ IP and 3 or less ER to also include games of 7.2+ IP and 4 or less ER.  Based on my analysis of innings pitched by starters and the frequency of when they were lifted for relievers, coming one out short of the eighth inning truly merits being allowed to give up that fourth run.
If you have not yet read the NL Article on this same subject, I highly suggest you click the below link – that way you will understand the rubric and reasoning.
To read the NL 2007 SP-Effectiveness article, and see the results, click here
In this article, I am applying my model to 2007 American League pitchers.  Just like the NL, there were some expected results, as well as some initially peculiar results that make sense upon further thought.  Additionally, just like with my NL post, I did not apply this to every American League pitcher.  Instead, I selected 1-3 pitchers from each AL team.  Before the 2008 season begins I will plug every pitcher from both leagues into my system to see who was worst – which is always fun.
I will not explain all of the statistics or points values, since I did that in the previous post on the NL, but I will say that I did consider the fact that AL managers did not have to worry about pinch-hitters.  Due to this, I considered making the IP requirements more stringent with the AL, but the fact is that even though they do not need to be removed for pinch-hitters, they are facing an extra offensive player (not a pitcher in the 9th spot).  They should, in theory, give up more runs and have just as good of a reason to come out of a game.
Overall, though, only a few more AL pitchers had over 225 IP than NL pitchers and so it was not worth changing.  The biggest difference in both leagues was the average IP/game of the selected pitchers.  AL starting pitchers accounted for 66.2% of the total IP in 2007, whereas NL starting pitchers accounted for 63.5%.  Though the numbers are pretty close, when we are dealing with over 23,000 IP in a league that extra 2.7% equates to approximately 600 IP.

  • To view the raw statistics of all the pitchers used, click here.
  • To view the list of AL SP used in the order of effectiveness points, click here

Again, if you wonder why certain statistics are used and/or why they were assigned certain points, please read the previous NL article linked at the top of the page.
I do not want to post a table of 28-30 pitchers, so you will have to click the link to view the results spreadsheet, but I will list the top ones below.

  1. CC Sabathia, +84
  2. Dan Haren, +76
  3. Fausto Carmona, +74
  4. John Lackey, +72
  5. Roy Halladay, +68
  6. Johan Santana, +60
  7. Mark Buehrle, +59
  8. Josh Beckett, +58
  9. Justin Verlander, +58
  10. James Shields, +57
  11. Javier Vazquez, +57
  12. Kelvim Escobar, +57
  13. Joe Blanton, +57

In the National League, the odd ranking was Chris Young, whose barometrical statistics suggested he should have been ranked higher.  In the AL, Beckett falls into the same category. The issue here has nothing to do with Beckett’s numbers, but rather the fact that there were other pitchers who were not as lucky as he was in getting run support or solid bullpen help. 
Of the players listed above Beckett, both Santana and Haren had 7 tough losses, Buehrle and Lackey had 5 tough losses, Halladay led MLB in IP/gm and CG, and Carmona had more legit wins and less legit losses.
Essentially, there is nothing wrong with Beckett’s 2007 numbers, however there were other pitchers who happened to perform better in certain areas than he did.
The Red Sox had a dynamite bullpen, so going to Okajima or Papelbon was something that just about any manager would feel comfortable and justified in doing, whereas some of these other teams needed their starters to last longer. 
No, this system does not take into account any sort of clutch factor, where I am sure Beckett would excel, but it does level the playing field to show which pitchers were the most effective, based on the numbers they individually put up. 
Just like the conclusion that was made in the Snell/Zambrano comparison, this is all about consistency.  The quality of Josh Beckett’s AQS’s may have been far greater than those of the other pitchers, however they occurred less frequently compared to the same other pitchers.   Even though his good-great games may have been astounding, when he was having average or bad games, the other pitchers were still having good-great games.
Beckett had an AQS 67% of the time (20 of 30 starts) while those listed were 73% and higher. This is not necessarily a measure of how good a pitcher was in his good games, but rather how often he was good.
One of the major reasons we considered Beckett to have been so good this past season was his record.  If he was only 15-9, like Dan Haren, there would not have been a Cy Young debate. 
That tends to be a problem because, as I will get into in the next category, W-L records do not differentiate between these Cheap Wins and Tough Losses.  If we gave every pitcher a Win for each Tough Loss, and a Loss for each Cheap Win, Beckett’s record would not have been 20-7.  It would have been 19-8. 
There is not a huge difference between his 20-7 and 19-8, but when we do the same for the AL pitchers above him in points, we get the following records: Sabathia (21-5), Carmona (23-4), Santana (19-9), Haren (21-3), Buehrle (15-4), Lackey (22-6). 
If we are going to use W-L record as a barometer, and include these Tough Losses and Cheap Wins, all of those above records are either better than or equivalent to Beckett’s 19-8.
Based off of just looking at the Adjusted W-L records, if we were to use that as the barometer for the Cy Young Award or the best pitcher, the debate would not be between Sabathia and Beckett – it would be between Haren and Carmona.  I am not saying it should have been between Haren and Carmona, but rather that if we are going to use W-L as an “end-all” statistical solution, we should at least use the Adjusted W-L, or the True W-L.
I described the different types of wins in the NL article but I did not mention the statistic “True W-L Record.”  In order to properly evaluate pitchers, W-L records have to be broken down and examined.  Some pitchers will get tremendous run support and win games even if they only last 5.1 innings and give up 4-5 runs. 
Then there are some who will go 6.2-7.1 innings, give up 2-3 runs, and lose.  After separating these Cheap Wins and Tough Losses from a W-L record, we are left with a record of legitimate wins and losses – games that a pitcher deserved to win or lose based on performance. 
A legit win occurs when you record an AQS and win, and a legit loss occurs when you do not record an AQS and lose.
The difference between True W-L and the Adjusted W-L I used in the Beckett comparison is that the True W-L does not include Cheap Wins or Tough Losses.  True W-L only includes games in which the pitcher recorded a win or loss when either decision was merited.
You can see these True W-L Records in the raw statistics spreadsheet, but I have listed the best ones below.  In parenthesis next to the True W-L Records are the Actual W-L Records.

  • Dan Haren, 14-2, (15-9)
  • Kelvim Escobar, 14-3, (18-7)
  • Fausto Carmona, 18-3, (19-8)
  • Josh Beckett, 17-5, (20-7)
  • Chien Ming-Wang, 17-5, (19-7)
  • CC Sabathia, 17-5, (19-7)

Again, we see that if win-loss was to be the “end-all” tool to evaluate a Cy Young Award or the best pitchers, Haren and Carmona would be atop the list.
For fun, I decided to plug some legendary seasons into my system to see what the end results were. Yes, it is impossible to perfectly compare a season from 1966 to one from 1996, but still it is interesting to see how they would rank. To do this, I took the 1968 season of Gibson, the 1995 season of Maddux, and the 2000 season of Martinez. The points results for the three were:

  • Bob Gibson, 1968, +178 pts
  • Pedro Martinez, 2000, +104 pts
  • Greg Maddux, 1995, +97 pts

And there you have it.  By the middle of February I should have a spreadsheet/PDF made up of all NL and AL pitchers plugged into this effectiveness model.  That way we can see who were the absolute worst as I am sure we will find some surprises and unexpected names there.
The biggest surprises to me in both leagues, in a positive turn, were Bronson Arroyo and James Shields.
The most unexpected finishes were Beckett and Chris Young, as I predicted they would be higher.
An interesting thing to look at is how players on the same team ranked next to each other.  In the NL, Zambrano is widely thought of as the #1 of the Cubs, yet Ted Lilly finished much higher.  In the AL, Kazmir is definitely thought of as the Rays ace, yet Shields ranked 9th out of the pitchers used here, and Kazmir finished 20th.
And, since the Yankees have to be stubborn, both Pettitte and Wang tied in effectiveness points. 
This model is not the end-all solution to determining who the best pitchers are in a given year, but it is a darn good predictor and estimator since it equalizes the field of play and makes sure it is known that you do not have to be on a great team to be a great pitcher or have a very effective year.  
This measures a specific season, where some players may be better than others, even if they are nowhere near better in a retrospective look at their careers. 

2007 NL Starting Pitching Analysis

When it comes to analyzing and comparing pitchers, those conducting the comparisons will often find themselves in a tricky situation.  Sure, certain pitchers are better than others, but what are they specifically better at? 

How can we conduct an honest analysis when there are so many variables to consider?  And how can we truly determine which pitchers were better than others when some are on terrible teams with no run support and others are on tremendous teams with tons of run support?
The first step is to determine what we are measuring.  If we want to know who the best strikeout pitcher is, we should look at the raw total for strikeouts and also an average of K/IP, since some guys will make less starts than others.  To figure out who walks the least, we measure the number of walks each pitcher gives up and a walk-IP ratio.
These measurements are contingent on one category, though, and cannot tell us who is better or more effective than the rest.  All of the research and ideas presented in this article are designed to measure the “effectiveness” of a pitcher. 
In order to determine this effectiveness, a whole heck of a lot of numbers need to be measured and properly weighted/scaled so that everybody has a fair shot – whether or not they are on a great team.
I took the 1-3 best pitchers from each National League team and entered their statistics into a database, measuring everything from their raw Innings Pitched totals to their Adjusted Quality Start % (you’ll read more on that below).  After entering all of the statistics, and crunching numbers until my brain turned to mush, I came up with my weighted points system.  I assigned the corresponding point totals and added everything up to determine what I feel is a very accurate measurement of pitching effectiveness amongst the NL’s best. 
This was not applied to every single NL Pitcher in 2007 (I will do that another time) but rather amongst these 30 selected #1, #2, or #3 starters.  For instance, a guy like Jeff Suppan may have been more effective than Jason Bergmann but I wanted to have at least one person from each team.
The system is not 100% perfect and does not take into account every single statistic (do you know how many statistics there are??), but it definitely levels the playing field between those on good or bad teams, those injured/called up or just plain bad, and those who got lucky or unlucky with run support.  The points are assigned based on the areas I, as an intense student of the game, feel are most important to determine true effectiveness. 
The basic idea of this system is to measure the true quality of a pitcher over his season – IE, what would happen if a pitcher was rewarded every time he pitched well and discredited every time he pitched poorly – something that happens perfectly just about 0% of the time. 
We will begin by going over the statistics involved, what their points scale was, and why they are used.  The idea behind these corresponding point totals is to properly weight the areas in which most people intuitively attribute to success and quality.
The points given to each statistical subset are designed to separate the aces from the workhorses and the workhorses from the seemingly replacement level pitchers.  They may seem arbitrary and could be replaced with different numbers, or fractions/decimals, however the difference between the points in subsets was based on the amount of pitchers who fall into certain categories.
In order to be as effective as possible, a pitcher needs to make as many starts as he can.  How can we say that a pitcher with 14 starts is more effective than one with 34-35, even if his numbers in those 14 starts are tremendous and the numbers of the one with 34-35 are a bit worse?  His numbers may be better than the pitcher with 35 starts, however the latter pitcher was involved in 21 more games and proved to be durable enough to pitch an entire season, and solid enough to maintain his SP status for 162 games. 
This does not mean that a pitcher with 35 starts is necessarily “better” than one with 14-16, but rather he is more effective because he is involved in more of his team’s season. 
If the pitcher with 14-16 starts posted the same numbers in 32 starts, it would not be a contest.  But, he didn’t – it was only 14-16.  You cannot have as much of an effect on your team (actual play, not motivational or anything) unless you are out there as often as possible.
***What the end result of this effectiveness points system showed is that those with average numbers, over 30+ starts, were equally as effective, or slightly better/worse, than those with good numbers over 16-20 starts.***
If somebody makes only 14 starts in a season, it could be because he was injured for half of the season or was called up from the minors during the season, so he should not be penalized with negative points for that – he just should not be rewarded as highly as someone with 30+ starts.

  • if over 30 starts, +5
  • if 25-29 starts, +3
  • if 20-24 starts, +2
  • if under 20 starts, 0

Just like Games Started, IP can only get you positive numbers, because the low raw number of IP can be attributed to injury or a midseason call-up.  Those with more IP get higher point totals, though.  The reason for 0 points for under 100 innings is because you were not necessarily a bad pitcher, but the lack of innings (whether due to injury or a call-up) limits the effectiveness.

  • if 230+, +8
  • if 220-229, +7
  • if 200-219, +5
  • if 150-199, +3
  • if 100-149, +2
  • if under 100, +1

This is where negative numbers can begin.  If you were hurt, or called up from the minors, you are not penalized with negatives for the raw number of innings pitched or games started, but if you posted a high number of starts and low number of innings, this statistic will bite you in the rear.  IP/Game separates the hurt or called up from the downright below average or bad.  It also helps reward those with a couple less starts than others but with more raw innings pitched.  These types of pitchers were in the same GS range but some went deeper into games than others.  Nobody averaged over 7 IP/gm, so we start lower.

  • if 6.5-7 IP/gm, +7
  • if 6.0-6.49 IP/gm, +5
  • if 5.5-6 IP/gm, +3
  • if 5.0-5.5 IP/gm, 0
  • if below 5.0 IP/gm, -5

If you cannot average over 5 innings per game, or exactly 5 innings per game, you should not be a starting pitcher.  Even Adam Eaton averaged over 5 IP/gm in 2007.
Quality Starts can be an inaccurate statistic because it takes into account games in which a pitcher goes 6+ innings and gives up no more than 3 earned runs… and nothing else.
If a pitcher goes 8.1 innings and gives up 4 runs, it is arguably the same ratio and an equal game in terms of quality, but does not get counted as a quality start.
With that in mind, I came up with the stat of Adjusted Quality Starts, which takes into account all regular quality starts as well as games in which someone goes 7.2-9 innings and gives up no more than 4 runs.  This measures the true number of games in which a pitcher had a good-great performance.
***If you wonder why it is 7.2 IP, instead of 8, the number was derived from the amount of times a pitcher was lifted after 7.2 IP for a specialist, or other sort of reliever, and from the sheer low average of innings pitched/game by a starter this year.  Reaching the 7th inning is now a great feat, let alone coming within one out of finishing the 8th.  Though the previous ratio for a QS was 2:1, due to the data mentioned above, going an extra 1.2 IP to get to 7.2 IP merits being able to give up one more run.***
I used the percentage of AQS to the total number of Games Started to measure effectiveness in this area.  Someone over 75% almost always pitches a good-great game, whereas someone under 50% only pitches a good game less than half of the time – not very effective.

  • if AQS % is above 75%, +5
  • if AQS % is 67-74%, +3
  • if AQS % is 50-66%, 0
  • if AQS % is below 50%, -3

If you’re keeping score at home, AQS= 6+IP with ER =< 3, AND, 7.2+IP with ER =< 4, where =< is the blog version of greater than/less than or equal to. 
In addition to AQS, something that needs to be taken into account is how often a pitcher went for a complete game, since they are so rare.  We also need to take into account a shutout, since they occur even less. 

  • For every CG, +2
  • For every SHO, additional +1

***NOTE: Aaron Harang had two games in 2007, one where he went 9 IP, and one where he went 10 IP, when he did not get a decision.  Even so, I am counting these 2 as a combined 1 CG, since he went 9+ innings.***
W-L Records are the most deceiving statistics because they do not take into account the true quality of the games pitched.  Just because a pitcher goes 14-7 does not mean he was necessarily a great pitcher.  He could have pitched terribly and had great run support in 10 of 14 wins, but brilliantly with terrible run support in the 7 losses.
The whole point of the adjusted W-L records is to get an AQS, since that means you pitched well and should be rewarded, even if your team (offense or bullpen) does not help you. 
After all, Ian Snell cannot control the Pirates’ offense.  It is not his fault that 4 of his 12 losses were “Tough Losses” and all 11 of his No-Decisions were games in which he pitched brilliantly and had an AQS, yet he received little to no offense to help garner him a ‘W’.
With that in mind, I changed W-L to the following 5 stats:

  • Cheap Wins: wins in which one does not get an AQS (-1)
  • Tough Losses: losses in which one does get an AQS (+2)
  • Legit Wins: wins in which one does get an AQS (+2)
  • Legit Losses: losses in which one does not get an AQS (-2)
  • ND-AQS: no-decisions in which one gets an AQS (+1)

I received some questions for how these numbers came to be, and to keep it simple, the statistics that actually have an effect on the W-L record are valued higher (negatively and positively) than the statistics like ND-AQS, which prevent a pitcher from winning but do not hurt him with a loss.
ND-non AQS is not used here for the same reason that Cheap Wins is only negative one, which is that not every Cheap Win or ND-non AQS was a terrible start.  A large bulk of them were games in which a pitcher had a good outing but only went 5 or 5.1 innings.   Cheap Wins loses you a point (not two, only one) because you do not get an AQS but it does effect your win-loss record.  ND-non AQS means you do not get an AQS but it does not effect your win-loss record, which is why I decided to just leave it out.
Though I am not too fond of this statistic and originally tinkered around with separately evaluating H/IP and BB/IP, using WHIP just seemed to make things easier.  Though it does not tell us which pitchers walk less and give up more hits, or vice versa, or tell us how many “empty innings” a pitcher had (innings where no baserunners got on), it does provide a valid average of baserunners to expect in a given game since it does not equate to a per-9 inning scale.

  • if WHIP 1.00-1.15, +3
  • if WHIP 1.16-1.25, +2
  • if WHIP 1.26-1.30, +1
  • if WHIP 1.31-1.40, 0
  • if WHIP above 1.40, -2

Instead of using K’s, I wanted to use the ratio of strikeouts to walks, since not every pitcher is a strikeout pitcher.  Even so, you do not have to be a strikeout pitcher to be an accurate one, and because of this I rewarded those with high K:BB ratios.  Greg Maddux only struck out 104 in 34 starts, but only walked 25 – a K:BB of 4.16.  This meant that Maddux kept more runners off-base by striking them out and not walking them.

  • if K:BB above 4, +7
  • if K:BB above 3, +5
  • if K:BB above 2, +3
  • if K:BB above 1, 0
  • if K:BB 1 or below, -3

Now that we have the points, let’s test it out and put it to use.  We will use Ian Snell and Carlos Zambrano.
The table below shows Ian Snell’s 2007 numbers and points he receives for each in my points system.

Starts 32 +5
Innings 208.0 +5
Cheap W 0 0
Tough L 4 +8
Legit W 9 +18
Legit L 8 -16
ND-AQS 11 +11
AQS % 75% +5
IP/Game 6.52 +7
WHIP 1.33 0
K:BB 2.60 +3
CG 1 +2
SHO 0 0

When we add up all eleven of these numbers, we get Snell’s Effectiveness #, which comes to: +48.
Now, let’s look at Carlos Zambrano’s season numbers in the table below and add his point totals up.

Starts 34 +5
Innings 216.1 +5
Cheap W 0 0
Tough L 2 +4
Legit W 18 +36
Legit L 11 -22
ND-AQS 0 0
AQS % 53% 0
IP/Game 6.36 +5
WHIP 1.34 0
K:BB 1.75 0
CG 1 +2
SHO 0 0

We look at his numbers and add up the totals to get his Effectiveness #: +35.
Zambrano had more legit wins but also more legit losses, and of Zambrano’s 3 no-decisions, none were ND-AQS, whereas of Snell’s 11 no-decisions, all were ND-AQS. 
That tells us that if each player got a win for every game he pitched well, and a loss for every game he did not pitch well (did not get an AQS), and the only no-decisions they received came from no-decisions that they pitched poorly in or did not go a full 6 IP, their records would look like this –

  • Carlos Zambrano (18-13) would actually be 20-11
  • Ian Snell (9-12) would actually be 24-8

Snell went further into his games, had a better K:BB ratio, and had that higher AQS %.  It also tells us that of Snell’s 32 starts, 24 of them were of great quality, whereas Zambrano had 18 good-great starts and 16 average-bad starts.
This essentially tells us that while Zambrano’s good-great starts may have been better than Snell’s good-great starts, when Zambrano had his bad starts, Snell was still having good-great ones.
As mentioned before, I used this points system to evaluate 30 National League pitchers.  I compiled a group of spreadsheets, ranking the pitchers in order in different categories to show that certain stats we rely on do a bad job of proving effectiveness.
To view all of my results, click on the links below.  You can use this data in other areas, but please credit my work.

  • To see the list of pitchers and their statistics used to assign points, click here.
  • To see the list of pitchers in order of effectiveness points, click here.

I do not want to post a ridiculously long table on this article, so you will need to look at the linked files to see the results, but I will list the top 15 pitchers and their effectiveness points.

  1. Jake Peavy, +74
  2. Aaron Harang, +69
  3. John Smoltz, +69
  4. Brandon Webb, +67
  5. Cole Hamels, +65
  6. Brad Penny, +64
  7. Tim Hudson, +63
  8. Ted Lilly, +60
  9. Matt Cain, +52
  10. Roy Oswalt, +50
  11. Ian Snell, +48
  12. Bronson Arroyo, +47
  13. Derek Lowe, +47
  14. Greg Maddux, +45
  15. Adam Wainwright, +45
  16. Jeff Francis, +45

And, again, these points were assigned to statistics based on how important they corrolate to effectiveness.  The points system essentially covers the statistics and averages from all angles.
The most shocking part of this was how low Chris Young of the Padres came out.  Young went 9-8, with a 3.12 ERA, in 30 starts.  He should have been more effective, I thought, based on those numbers.  After looking at his game logs, though, I changed my mind and realized it made sense.
Of his 30 starts, he was essentially two different people.  In the 19 starts in which he went for 6+ innings, he was 9-1 with a 1.64 ERA, averaging 6.6 IP/gm, with a 0.85 WHIP and 129 K’s in 126.1 innings.
In the other 11 starts, he was 0-7, with a 7.14 ERA, only going 4.2 IP/gm, with a 1.76 WHIP, and 38 K to his 36 BB, in 46.2 innings.
After analyzing his situation and the points system I realized that my effectiveness model favors consistency and lower standard deviations (the average of how far someone strays from his average).  To me, that truly defines effectiveness.
I would much rather have a guy who I knew would amass an AQS 67% or more of the time than a guy who might strikeout 20 batters and pitch a two-hitter in one game, but give up 5 runs in 6 innings for the next three, before again pitching a brilliant game.
As long as the consistency is of a good nature, consistency in this model proves effectiveness.
I know, we’re finally at the end of the article, right?  I apologize for the length but it took this long to get everything across. 
Looking at Jake Peavy, the most effective NL pitcher at +74, we see that the only counted statistic in which he led was AQS.  Peavy had the most good-great starts of any NL pitcher.  While he may not have led in IP, IP/gm, K:BB ratio, or least losses (Brad Penny only had 1 legit loss), he led in consistency and being consistently good-great.
These results also show that Cole Hamels, with 6 more starts that he missed due to injury, would likely challenge Peavy for #1 in effectiveness – however, as my model dictates, the fact that he missed those 6 starts and Peavy did not shows that Peavy was more effective.
Yes, there were more stats we could add to this, and more variables to account for, but I feel this accurately levels the field of play between pitchers in distinctly different playing situations, and levels the difference between 2007 reputation and 2007 actual performance.
I must remind you before I come to a close, though, that this is only a measure of effectiveness, not the end-all solution to determining who the “best” pitchers are.
However, for this Sabermetrician, effectiveness directly corrolates with quality and value.