Month By Month

Well, the 2008 season is almost over, and as soon as the Twins and White Sox duke it out for AL Central supremacy, the regular season will officially be in the books.  This year saw utter domination by Cliff Lee and Albert Pujols for the entire duration of the season, Mike Mussina finally notch his first 20-win season, the ageless Jamie Moyer prove to be the most consistent starter on the Phillies, some incredible streaks by the likes of Chipper Jones, Carlos Delgado, and Ryan Howard, and Tiny Tim Lincecum emerge as a true force to be reckoned with. 
Since the season is virtually over, I figured it would be interesting to take a look, month by month, at the top performers, both offensively and on the mound.  The tops for each month were taken using WPA/LI, and the monthly splits found at my home away from home, Fangraphs.  WPA/LI essentially tells us how many wins above average a player performed.  WPA may actually be better for pitchers, but for consistency’s sake we will use the leveraged wins.
The top offensive performer in April happened to be a tie between two members of the Philadelphia Phillies: Pat Burrell and Chase Utley both produced a 1.52 WPA/LI.  Burrell hit .326/.452/.674, with 8 home runs and 8 doubles.  Teammate Utley hit .360/.430/.766, with 11 home runs and 10 doubles.  Suffice it to say, their hot starts more than made up for Ryan Howard’s early struggles.  In the pitching department, Cliff Lee had an absolutely incredibly April, producing a 1.94 WPA/LI.  In five starts, he posted the following numbers: 37.2 IP, 19 H, 1 HR, 2 BB, 32 K.  Yes, folks, that is a 16.00 K/BB ratio.  Coupled with his lone longball allowed, this resulted in a 1.94 FIP, but a sub-.200 BABIP helped his ERA to finish the month at just 0.96.
As May came to its close, two participants in the 2005 World Series topped the hitting and pitching leaderboards.  Lance Berkman, continually one of the most underrated players, was 2.16 wins above average, hitting .471/.553/.856 with 9 home runs and 11 doubles.  Jose Contreras of the White Sox, who likely will not be remembered for having a good year or anything of the sort, had a tremendous May, when he was good to the tune of a 1.57 WPA/LI.  In 6 starts, he pitched 43 innings, surrendering 28 hits, just one of which left the yard.  He issued 8 walks and punched out 31 hitters.  His 2.09 ERA wasn’t all luck, either, as his FIP was a very low 2.62.
June was all about JD Drew and John Lackey.  Drew put together a 1.99 WPA/LI thanks to his .337/.462/.848 slash line, as well as his 12 home runs, 7 doubles, and 2 triples.  Lackey, who missed time early on, was back in full form as the summer began.  In just five starts he performed 1.44 wins above average, on the heels of a 1.16 ERA/2.59 FIP, a 0.78 WHIP, and a 4.86 K/BB.  In 38.2 innings, he gave up 23 hits (2 HR), walked 7, and fanned 34.
Next up, July, which happened to be the month of… Adam LaRoche?  Yes, Adam LaRoche was July’s best hitter, with a 1.58 WPA/LI.  With 7 home runs, 7 doubles, 2 triples, and a .390/.472/.805 line, he heated up just as the weather did.  CC Sabathia owned the month, as well, despite posting the lowest WPA/LI for a monthly leader at 1.12.  In six starts, he managed 47.2 innings, with just 36 hits given up.  He walked 12 and struck out 39, en route to a 2.27 ERA.
As summer came to its close, newly acquired Mark Teixeira gave the Angels a 1.84 WPA/LI, thanks in large part to his .386/.479/.663 line, with 8 home runs.  Sabathia again proved to be the top pitcher, right around one and a half wins above average.  In six more starts, he totaled 48.1 innings, 40 hits, 8 walks, and 51 strikeouts.  With a 1.12 ERA, 2.06 FIP, 0.99 WHIP, and 6.38 K/BB, nobody pitched better in August than CC.  Coupled with July, Sabathia posted the following numbers: 12 GS, 96 IP, 76 H, 21 BB, 90 K, with a 1.69 ERA.
September was the month of Ryan Howard.  He may have started off slowly, but he picked it up when the Phillies needed it most.  Then again, if he didn’t stink so much in April and May, perhap they would not have needed such solid performance late in the season to win the division.  Anyways, he hit .352/.422/.852, with 11 home runs, 7 doubles, and he somehow managed to hit 2 triples as well.  All told, his 1.68 WPA/LI topped all hitters.  Roy Oswalt came in at 1.64, just a few ticks under Howard.  Oswalt struggled early on, but was a key member of the Astros’ late surge, and put together a great overall season.  In the final month, he pitched in 44.1 innings, giving up 23 hits, walking 6, and fanning 30, with a 1.42 ERA, 2.98 FIP, 0.68 WHIP, and a 5.00 K/BB.
Overall, I had a blast covering the league this year and am looking forward to the post-season, where the Phillies will hopefully dominate.  While I will sprinkle some more research pieces here and there throughout the off-season, I am going to be posting a year-in-review series, auditing teams so to speak, documenting data with the help of Pitch F/X, and examining splits and tendencies. 


The PanDIBS theory: Pitching and Defense Independent Batting Statistics

DIPS changed everything.  (Thanks Voros!)  It was the first sustained theory that evaluated players not so much by what a player had done over the last year, but at which part of the player’s (in this case, the pitcher’s) performance was something within his control and what was out of his control.  The theory has been refined here and there, but the basic idea remains: there are some things that a pitcher has more control over than others.  It’s a little disconcerting to think that so much of baseball rides on luck, but it’s important to know.
What’s odd is that this line of theories seemed to stop there.  To my knowledge, no one’s really looked at whether there’s any analogous coherent theory out there for batting statistics.  Are there some batting stats that seem to be more statistically reliable (i.e., skill based) and some that are more un-reliable (i.e., luck based).  I’d contend that the answer is yes, and the pattern works in a specific way.  In previous columns, I’ve taken some time to meditate on the statistical reliability of many stats, some more esoteric than others.  (Eric Seidman once called me the master of statistical reliability.)  When I looked through some of the work from a more wholistic standpoint, the pattern became pretty clear.
Take a look back at this article on when different statistics stabilize enough to the point where they can be considered reliable.  The stats that stabilize the quickest are the ones over which the batter might be expected to have the most control (whether or not he swings, how often he makes contact), but then are followed in controllability by things over which there is some interaction between the batter and the pitch (type of batted ball), and then by some of the actual results that come from that batted ball (single, home run, out).  Roughly.
Let’s model the outcome of an at-bat in a flowchart.  A plate appearance can basically end in one of four ways.  The batter can walk, strikeout, be hit by a pitch, or do something that involves hitting the ball (or he’ll reach on fielder’s interference… once every five years).  The first three events end the plate appearance right there.  If he hits the ball, it will either be a flyball, grounder, liner, or popup.  If it’s a flyball, it might be a HR, or it might be an XBH or a single or be caught (or dropped) by a fielder.  I could do the same basic breakdown for all the other types of batted balls.
As you get further and further down the flowchart, with more steps involved in the process, the underlying rates become more unreliable statistically.  Part of it is the fact that as you split off further and further, a player only has say 150 ground balls, but may have 600 plate appearances.  Anything where you get 600 measurements on anything, it will be more reliable than 150 measurements.  But, in general, when you constrain the data set so that you’re comparing the reliability at 150 PA vs. 150 GB, the stats closer to the base of the flowchart still show up as more reliable.
DIPS proposed two categories for statistical reliability.  Category one was a pitcher’s K rate, BB rate, HBP rate, and (I believe erroneously) HR rate.  Category two was the now famous BABIP.  BABIP was considered to be the product of luck, while K and BB were the product of skill.  Here I propose PanDIBS, with three (perhaps four) tiers of batting statistics to consider.  The most reliable of all stats are the swing diagnostics (and we know that they’re important), although no one ever really wants to project what J.D. Drew’s contact percentage will be.  Let’s call swing diagnostics the zero-th level.  The first level, in terms of reliability of the stats are the DIPS stats: K rate, BB rate, HBP rate.  The second level is the player’s batted ball profile (GB%, LD%, FB%).  The third level is what we really care about, things like HR and doubles.  Sadly, those are the ones most likely to be influenced by luck.
I’d also propose that it’s important to look at each type of batted ball seperately.  A little while ago, I looked at Kelly Johnson’s season and found that there was very little consistency from year to year when in came to outhitting one’s expected BABIP.  The answer was “not much consistency.”  (I found a four-year intra-class correlation of around .27).  What I didn’t know then was that there are different effects for different types of batted balls.  I looked at how well players did in “outhitting” their expected BABIP, chopped up by each different type of batted ball.  For example, 24% of grounders go for hits, while 73% of line drives do.  So, we would expect players to have a .730 BABIP on liners and a .240 BABIP on grounders.  Of course, things vary, but do they vary consistently?  If a player is above average in year one, he should be above average in year two.  That’s the mark of a skill-based stat. 
The answer depends on the type of batted ball.  Players were more consistent in ”outhitting” their expected BABIP on flyballs (ICC’s over 4 years were in the mid-.30s, depending on what PA inclusion criteria was used) than grounders (ICC in the mid-.20’s), and line drives (about .10).  When I split up flyballs into infield flies (which have an expected BABIP of about .025) vs. non-infield flies.  In fact, getting more hits on popups was almost entirely luck (.02 ICC over four years).
It makes sense that there would be more of a “skill” in out-hitting expectations on flyballs.  Some players are rather skilled at hitting them off the wall, and some are not.  The skill in out-hitting expectations on ground balls is called “speed.”  Line drives on the other hand are just a matter of luck as to whether someone catches them or not.  A high line drive will likely go off the wall, but that’s about it.  If a popup goes for a hit, either someone missed it, or the batter simply lucked into a Texas Leaguer.  So, when looking at whether a player will continue with getting all those hits, it’s important to know what type of hits he’s getting and what the base rate expectations are for that type of ball  So, if you see someone who hits a lot of line drives have a dip in his performance (or a breakout year), expect a lot of regression to the mean.  If he’s the kind of guy who hits a lot of flyballs, he’s not going to have to give as much of that back in regression to the mean.
So, yes Virginia, it is possible to sort out which stats are the result of luck and which are the result of skill for batters too in a fairly coherent way.  There is variation in how reliable each stat is, but in general, the farther away the ball gets from the bat, the more luck creeps in to influence the outcome.

Two steps and tangos: a look at pivots

First, an aside.  You are probably expecting an article on run estimation. I was expecting to provide you with (yet another) one. The hang-up was all in getting my linear weights values to reconcile with actual run scoring totals. It’s not that I didn’t – oh, if only that was it. No, I got the linear weights values to reconcile, and without simply forcing them, but naturally. This simply was not accompanied with any ideas as to why the method I used worked. (I discovered it totally by accident.)

Since in fairness to you (and me!) I really don’t want to throw things out there that I can’t comfortably explain, or at least suggest an avenue of study, I’m going to table the run estimation articles until I can sort out some of the peculiarities I’m discovering the more I work with zero-baseline run expectancy tables. The good news is that it doesn’t impact the BaseRun formula I published in the comments, so that’s a small comfort.

In the meantime, though, I didn’t want to leave you all alone without an article one Friday morning. So hopefully you’ll indulge me in a little discussion of defensive stats – and I mean the really basic stuff, not the super-advanced UZR and PMR or what Brian’s talking about, but the official defensive statistics: the putout, assist, error and double play.

Because if you look at the official fielding stats of all players (except catchers), that’s all you’re ever going to get. And prior to the Retrosheet years (as well as many other years for leagues other than MLB), official fielding stats are all we may ever have. So it probably behooves us to see what we can do with them.

So let’s start off by looking at the official definitions. I’ll go ahead and note that they’re long, much longer than you would expect at first blush. I’ll go ahead and summarize them here, but I do suggest you take the time to go peruse them at some point.

  • A player is awarded a putout whenever he records an out on a batter – either by catching a ball on the fly, tagging the runner or through the forceout. Oh, and a catcher gets a putout every time a batter strikes out, because that makes sense somehow.
  • A player is awarded with an assist if he handles the ball on a play when a putout is made by another fielder. I do not want you to place too much importance on the word “handles the ball,” if you bat down a ball on the infield by diving for it and another player throws out a runner, you can get an assist.
  • An error is awarded whenever the official scorer fears that someone might notice that they’re actually paying him to watch a baseball game and so wants to look like he’s doing something.
  • A double play is awarded whenever a fielder records an assist or putout on a double play, or whenever he would have if not for the first baseman some other player making an error.

Read more of this post

How To Construct a Defensive Metric

If we are given sufficient detail in the play by play, defensive stats would be a compilation of not only putouts, assists and errors, but also of hits, doubles, triples, homeruns, extra bases by runners, etc., which are charged against each fielder. Then, to provide proper context, these observed values are compared to the expected values of the collection of batted balls that were hit to each fielder.
Whether they are soft grounders to short or line drives in the gap, each play is described as to whether it’s a hit our out, where it is hit, how hard, whether it’s a fly or grounder, etc. Plays with the same description are grouped, and then the probability of each grouping being an out, error, single, double, triple or homer is calculated. By counting the number of each type of play each fielder is presented with, and then multiplying those sums by the probability distribution of each type, the expected number of outs, hits, etc for each fielder are derived. Typically, the difference between the observed and expected values is expressed by subtraction as a plus or minus number of plays, or as a ratio.
This is one of the places where I a favorite tool, which I call the “Inverse James Function”. The ever brilliant Dan Fox gave a rundown of James’ original log5 method at The Hardball Times
ExAvg = ((BAVG * PAVG) / LgAVG) / ((BAVG * PAVG) / LgAVG + ((1-BAVG)*(1-PAVG)/(1-LgAvg))
Bill James introduced this in the 1981 Baseball Abstract to answer the question “Given a certain batter and a certain pitcher, in the context of the league mean, what should the result be?”
My inquiring mind twisted this around to ask “What if ExAvg is instead what I observe. I know LgAVG and can calculate PAVG as the expected value. Then if I solve for BAVG I have the true value of B in the context of P and L.” Changing ExAVG to ObsAVG and solving for BAVG, the formula becomes
I have used this formula as the core for my Park Factors, comparing the observed home values to the expected road, and also for MLEs, comparing the observed minor league value to the expected major league.
Basically, the formula expresses the ratio of the observed to the expected, multiplied by the mean, but it’s constructed so that the result will never be less than 0 or more than 1. If Obs = Exp, then R = Lg. If Obs > Exp, then R > Lg, and if Obs < Exp, then R < Lg.
Count every batted ball?
Everything gained by the offense is allowed by the pitching and defense. Philosophically, we can take a top-down approach that also states that the team stats are the sum of the individuals on the team. DER, as a team statistic, uses all batted balls, but individual defensive metrics like ZR and UZR exclude popups. PMR, SFR and OPA take everything into account, but PMR doesn’t breakout the results, such as separate ratings for groundballs and popups.
Account for every base?
DER and it’s derivative ZR are measure the percentage of batted balls that are made into outs, and so are analogous to batting average. They do not consider extra base hits, such as in slugging average. It is a defensive skill, a combination of positioning, range and arm, to keep a batter from stretching a single into a double, or a double into a triple. When evaluating the ability to keep baserunners from advancing, we can use a weighted mean of every groundball and flyball by base and out situation.
I think Dan Fox is also clairvoyant, as he seems to have stolen all my best ideas. His Simple Fielding Runs (SFR), as well as Pizza Cutter’s OPA! are two metrics which are capable of all these things. As both were designed to evaluate Retrosheet play by play data, they have the flexibility to handle most any kind of data input, calculating expected values on the weighted means of whatever play descriptions are available. I suggested to Dan that he could us GameDay data with SFR, and he was able to produce minor league ratings for 2007. This flexibility can also be a weakness, as the generated ratings are only as good as the preciseness of the play by play data being input. Therefore, not all SFR or OPA! ratings, even with the same sample size, have the same level of certainty, as this is dependant on the source of data, although this fades with large enough sample sizes. Given the same input data as UZR or PMR, SFR and OPA! should give equivalent results, but SFR and OPA! can give results when less than optimal play by play is all that is available.
One other neat thing Pizza Cutter has done with OPA! is the ability to rate infielders on different skills used to make outs, such as range, hands and arm. Each is expressed in terms of runs, so that they can be added to a grand total as well as listed individually. The components can also be scaled, such as 1-10 or A-E. Then you can say that Derek Jeter has a range of E, but an arm of A, which still adds up to a poor shortstop.
With detailed play by play now available for all levels of professional baseball, the ability to measure defensive performance is light years ahead of just a few years ago. There are still a few tweaks to the scorekeeping that can eliminate most or all of the need for estimations, and there are also the more major upgrades like fielder locations and batted ball trajectories that may take more time to realize. I’ve commented on several of the metrics currently available. They all have a lot to like, but still have some limitations. Let’s not stand still, there’s still a lot of development to be done.

World Famous StatSpeak Roundtable: September 24

Roundtable, roundtable, where are you now?  Why, you’ve found Derek Carty of the fantasy section of The Hardball Times and RotoWorldAnd on the happiest day of the year: the day on which the New York Yankees are eliminated!  This week, Derek joins the usual four-man rotation for a look at the recent end of an era in New York stadia, the Next Generation of Sabermetrics, playoff matchups, and fun things to do with Pitch F/X. 
Question #1: While the PITCHf/x system is really starting to take hold of the baseball community, a system like this has nearly unlimited potential and surely hasn’t been put to its full use yet.  Movement and speed have become rather commonplace now, but what is one important area that PITCHf/x could look at that hasn’t been done yet?
Derek Carty: We see movement and speed graphs and figures everywhere now, but there is more to pitching than these two things. The whole mental side of pitching has been largely untouched, though it can play a large role in pitcher success. Location is a facet of pitching that is widely recognized as important and commonly talked about during television broadcasts, yet it has received little attention in the way of PITCHf/x. I penned a recent article about curveballs (and have plans for forthcoming articles), and my colleague John Walsh looked at fastball location at the beginning of the season. Aside from that, though, I’ve yet to really see it looked at much.
Perhaps the most interesting thing that is now possible to examine is the batter/pitcher dynamic and how game theory plays into it. Are there any pitchers or batters who are especially good at it? There are a ton of factors that go into this, likely making it necessary to look at it at the micro level, but it would be incredibly interesting.
Brian Cartwright: I don’t know if there’s any more physical info the system can give us on the pitches, but there’s still lots of ways to analyze the data we might have not thought of yet. For ages teams have charted pitches on paper. Pfx does an electronic version. Both P.C. and Derek have written articles on plate discipline for hitters, and I admit I’m still working to wrap my head around some of the math (and upset that Access doesn’t have the necessary function that Excel does). What I am picturing is looking at plate discipline not just by balls and strikes, or in zone and out of zone, but also by what we can see that each batter can and cannot hit. Is he aggressive on pitches he has shown he can hit well? Can he lay off the slider below the knees? A heat chart at different counts for each batter can graphically show the pitcher the best and worst pitches to throw, which then leads into pitch sequencing…
Colin Wyers: I think the next big step for Pitch F/X research is probably to make some testable predictions using the data. It seems obvious to say that given more data about a pitcher we should be able to improve our ability to project future performance, but there’s quite a lot of labor involved between saying it and doing it. I’m not (only) talking about projections like Marcels and PECOTA – when a pitcher drops his release point, what can we conclude about his performance going forward? If his velocity is off in one start, how likely is it to carry over? There’s a lot of things there that haven’t really been quantified yet, at least not in a systemic way (that I’m aware of).
Eric Seidman: I enjoy using the Pitch F/X data to investigate questions that we have been or may be curious about.  For instance, I wrote a couple of articles at Prospectus about what happens to pitchers when they throw a ton of pitches in the first inning, separating them by average fastball velocity.  I am a huge fan of John Walsh’s article on whether or not an 89 mph fastball is more effective than a 95 mph fastball.  The data is also great at confirming the generally accepted principles or beliefs, such as pitchers with more vertical movement surrender more flyballs, but I feel like we will need 5-10 years before we can really do a whole heck of a lot with the dataset.
Pizza Cutter: Actually, this could be a pretty easy stat to run and maybe someone’s already done it.  We can identify which pitches are begging to be hit for home runs (so that hanging curveball).  Surely, we could start classifying the “mistake zones” on pitches and ask which pitchers have the highest/lowest mistake percentage.  Perhaps that could even become a way to find out who’s gotten lucky and who hasn’t.  It’s not just location either (although that’ll be important.)  A 97 mph fastball down the middle is a better pitch than a 87 mph fastball down the middle.
Read more of this post

Cliff, Roy… and the Rest

With only a week or so remaining in the regular season, the performance lines for starting pitchers are largely complete, as each will only take one more trip to the hill.  Barring a Brett Myers vs. the Marlins performance from last Friday, Cliff Lee is going to finish an absolutely remarkable season by adding the AL Cy Young Award to his resume.  His closest competition is Roy Halladay of the Blue Jays, but, while Halladay is putting the finishing touches on the best season in his fantastic career, and while he is definitely the better bet moving forward, Lee is technically still having the better season.  He isn’t blowing Halladay away, by any means, as the gap between the two is pretty steep and they are 1-2 in just about every pertinent category, but Lee is more often than not the 1 to Doc Halladay’s 2.
This post is not about Cliff vs. Roy, however, but rather about the rest of the pack.  Sky Kalkman brought up this point on my Fangraphs post regarding Halladay’s unnoticed season, in that Cliff and Roy are neck and neck, but after that, the gap vastly widens before we see the next best starting pitcher in the junior circuit.  As in, Lee and Halladay have both been so good that they are not only making others in the midst of great seasons appear not as great, but have seriously distanced themselves from the rest of the pack.
Now, WPA/LI might not be the greatest metric for pitchers, as TangoTiger pointed out–the stat treats each plate appearance as one plate appearance, which is great for hitters, but not as solid for pitchers.  Certain PAs should be counted as more important for pitchers.  Still, I am a fan of the metric, and when you look at AL starters, the following will be seen:

  1. Cliff Lee, 5.02
  2. Roy Halladay, 4.99
  3. Ervin Santana, 3.31
  4. John Danks, 2.86
  5. Josh Beckett, 2.47

Now raise your hand if you predicted prior to the season that Cliff Lee, Ervin Santana, and John Danks would comprise arguably three of the top five pitchers in the American League.  Okay, lower your hand, and please put out the flame that just engulfed your pants, because you are a liar.  Halladay is within .003 of Lee, but after that, next closest is Santana, who has had a great season, but has been over 1.5 wins less effective than both Lee and Doc.  Danks, who seemingly came out of nowhere to have this great campaign, is almost two and a quarter wins less effective, and Beckett, who usually gets brushed aside from his own team due to Jon Lester’s season and Dice-K’s gaudy W-L record, is 2.5 wins less effective on the season.
That is pretty drastic, especially when compared to the National League, where numbers one through six in context-neutral wins range from 3.83 to 3.03, a mere 0.8 wins separating these six pitchers.  Let’s take a look at the five AL starters a bit more in-depth.
Cliff Lee: 30 GS, 216.1 IP, 205 H, 11 HR, 31 BB, 162 K, 2.41 ERA, 2.78 FIP, 5.23 K/BB, .302 BABIP, 78.9% LOB, 4.9% HR/FB
Roy Halladay: 32 GS (33 G), 237 IP, 214 H, 18 HR, 38 BB, 201 K, 2.81 ERA, 3.04 FIP, 5.29 K/BB, .295 BABIP, 74.5% LOB, 9.5% HR/FB
Ervin Santana: 30 GS, 205.1 IP, 183 H, 21 HR, 46 BB, 200 K, 3.33 ERA, 3.28 FIP, 4.35 K/BB, .299 BABIP, 75.5% LOB, 9.2% HR/FB
John Danks: 31 GS, 183 IP, 173 H, 13 HR, 53 BB, 150 K, 3.20 ERA, 3.33 FIP, 2.83 K/BB, .304 BABIP, 77.0% LOB, 6.9% HR/FB
Josh Beckett: 26 GS, 168.1 IP, 166 H, 18 HR, 33 BB, 166 K, 3.96 ERA, 3.23 FIP, 5.03 K/BB, .324 BABIP, 71.8% LOB, 10.8% HR/FB
When we look at FIP this year, Lee is at 2.78, Halladay at 3.04, Ervin at 3.27, Danks at 3.33, and Beckett at 3.23.  The first four have ERAs very close to these marks, but Josh Beckett has a 3.96 ERA, three-quarters of a run higher.  Additionally, the first four all have sustainable, and “normal” BABIPs, while Beckett is much higher at .324.  Given that his LOB% and HR/FB are right around average, but his BABIP is higher, Beckett seems to be in the midst of an unlucky season, making him look less effective than he is.  Jon Lester may be the hot starter for the Red Sox this year, but make no mistake, Beckett is your #2, not Dice-K.
Lee’s LOB% is rather high and his HR/FB is very, very low, but his BABIP is right around the average, at .302, so it is very possible he can sustain some semblance of this in the years to come.  It won’t be the Lee we witnessed this season, but he could still be a very good pitcher if he manages to keep the ball on the ground like he has this season. 
All of Halladay’s numbers are in order, right around the averages in BABIP, LOB, and HR/FB, so he has not been as “lucky” as Lee, or Danks.  I use quotes because I am not referring to luck in the same sense as a fielder diving and having a look-what-I-found moment, but rather in the sense that Lee and Danks have lines built on some likely unsustainable numbers.  For Lee, the LOB and HR/FB are red flags.  For Danks, the same two.  With just a 6.9% HR/FB and a very high 77% LOB, as well as the highest walks, hits, and HBP per innings pitched of these five pitchers, Danks has had a great season but his skillset will need to improve for this success to continue.
Ervin, like Halladay, seems to be completely “in order” as he has a .299 BABIP, a 9.2% HR/FB, and a 75.5% LOB.  Many analysts kept waiting for him to falter, as well as teammate Joe Saunders, but Santana has been a rock for that Angels squad.
Generally speaking, FIP, Fielding Independent Pitching, is a better indicator of current success than ERA, since it measures the controllable skills, while xFIP is a better indicator of future success than FIP.  xFIP, kept at The Hardball Times, normalizes the home runs component of FIP, since some pitchers may post very unsustainable HR/FB percentages.  The league average is around 11%, so when we see Cliff Lee posting a 4.9% HR/FB, we know that this has been a major part of his success. 
With xFIP thrown in the mix, we get the following: Halladay (3.22), Beckett (3.28), Lee (3.66), Santana (3.68), Danks (3.92).  This tells us that if everyone gave up the average home run rate, with their current BB/K numbers, their FIPs would be these aforementioned numbers, which is what we would expect.  Adding 5-6% to Lee and Danks would increase both their ERAs and FIPs, while lowering their win-based metrics.  This is purely for moving forward, however, and not meant to knock anything either has done this season.
 Moving forward, we would expect someone with numbers similar to Lee’s, but a HR/FB closer to 11% to be more likely to sustain this performance as we move forward.  Again, keep in mind that I am not knocking Lee in any way, because there is a difference between current success and success moving forward.  Lee is having a better season than any of these others right now, but next year, or the year after, I would be more inclined to say that Halladay, Santana, or Beckett, may be more likely to sustain this year’s performance.
The gap between Cliff and Roy and the rest of the AL pack of starters may be quite impressive, but moving forward it should lessen quite a bit, assuming regression runs its natural course.

My 2008 Cy Young ballot

Last week, we talked about the MVP award and I made clear my preference that MVP voting not include pitchers.  They have their own award.  Here it is.  Not everyone agrees with that, but res gustae non deliberandae sunt.  (Matters of taste need not be argued.)  So, we turn now to the question of who has been the best pitcher in each league.
Like with my MVP ballot, I’m going to be using statistics that are a bit more advanced for pitchers.  Right now, in the AL race, people are swooning over Cliff Lee’s 20 wins and K-Rod’s five billtion saves.  We’re going to go a bit deeper than that.  So, like my MVP ballot, I’m using four categories: VORP, WPA, WPA/LI, and K/BB ratio as my benchmarks.   In order to be on the short-list you have to be in the top ten on two of those stats. 
(Note: I realize that this is a little unfair to relievers, because VORP is heavily dependent on playing time.  Starters who pitch 200 innings have an advantage over relievers who pitch 70.  Then, there’s the fact that most good relievers make their living off of high leverage situations, so deflating their WPA contributions by dividing by LI is robbing them of some of their mojo.  However, relievers generally dominate K/BB and can leverage a great deal of WPA out of their short stays in a game.)
Let’s see what happens, first in the American League.  And the nominees are:
Cliff Lee, Roy Halladay, Ervin Santana, John Danks, Justin Duch… the guy from the A’s, James Sheilds, Jeremy Guthrie, Mariano Rivera, Josh Beckett, Joakim Soria.
Let’s start out with the obvious.  Francisco Rodriguez is not on the list.  I know, he had all those saves.  But saves don’t tell you much more than when a pitcher pitched and what sort of team he pitched for.  Note that I’m not anti-reliever.  Mariano Rivera and Joakim Soria make the ballot.  A note on Rivera.  He leads the league in K/BB and it’s not even funny.  He’s striking out more than 10 batters per nine innings and walking fewer than one for a K/BB ratio in excess of 12!  Jonathan Papelbon, who sadly doesn’t get nominated, at least should get a mention for being just beind Rivera in this category with the same basic profile.
It’s pretty well understood that the real AL Cy Young will go to Cliff Lee.  (side note: yay!)  But should it?  Lee leads the league in VORP and raw WPA.  But, he’s behind Roy Halladay in WPA/LI (RH is first, Lee 2nd) and K/BB (RH 4th, Lee 5th).  And Halladay is second in the league in VORP and raw WPA.  In other words, they are neck and neck.  I find it funny that while Lee is considered a runaway winner, Halladay isn’t really getting much mention.  Yes, Lee won 22 games on a team that might make it to .500.  That’s pretty impressive no matter how you slice it.  Halladay’s problem is that he didn’t win 20 games (which makes one a much better pitcher than winning 19) and that because he’s been putting up amazing numbers for the last 4-5 years, he’s not a “story” like Lee.  Halladay deserves better.  Lee’s FIP is slightly lower, and on that tie-break, I’m giving him my first place vote.  But please, this is not Cliff Lee smashing the competition.  This is Lee just eeking out.  Halladay should get more press for his season and if justice (or at least Terry Pendleton) is served, a few first place votes as well.
Ervin Santana is the only one to make all four charts beside Lee and Halladay, so he gets third place.  Three players make three top-10 lists: James Shields (take that Scott Kazmir!), Mariano Rivera, and Joakim Soria.  Rivera gets the top spot in this tier (4th overall) because of his amazing K/BB work and because he’s Mariano Rivera.  Soria vs. Shields gets into the debate of whether a good starter or a good reliever is more valuable.  (Hi there, Joba!)  Yes, starters face more batters, but relievers face more important batters.  I still err on the side of going for the starter, for the reason that I think that the reason that there’s the “cult of the savior” is that closers pitch more anxiety ridden plate appearances.  Anxiety makes everything look bigger than it is.
Among the rest, Danks, Duke, Guthrie (really?), and Beckett… well there are only three slots on the Cy Young ballot (but ten for MVP… someone please explain), so the voting would never get down this far.  What went “wrong” with last year’s this-is-an-outrage-he-should-have-won-it candidate, Josh Beckett, this year?  His K and BB rates are nearly identical to last year, but he’s given up a lot of line drives this year.  Order them however you like.
In the National League, the nominees are: Lincecum, Santana, Sheets, Dempster, Hamels, Haren, Peavy, Webb,  Sabathia (the NL version), Kuo.
Tim Lincecum leads in three of the four categories, and the one that he doesn’t lead (K/BB) has more to do with his walk rate than anything.  He’s over 10 K’s per nine innings (but also over 3 walks).  For the curious, C.C. Sabathia leads in K/BB.  More on him in a minute.  Tim Lincecum should win the NL Cy Young, although I could see a stray vote going to Johan Santana.  The sad thing is that Brandon Webb will get several votes based on the fact that he leads the NL in the single most important pitching statistic out there: wins.
Johan Santana gets my second place vote (2nd in VORP, 3rd in WPA and WPA/LI), despite the fact that he had a “disappointing season.”  I mean, the guy’s just no good any more.  He didn’t win 20 games.  He didn’t “dominate” NL hitters, and if I’m not mistaken, he didn’t yet cure cancer.  (There’s still a bit more of the season to go.)
Then, there’s a pack of guys who are running neck and neck with each other.  Unlike the AL where there weren’t many guys who appeared on three (or four) lists, but in the NL, it was the same six basic guys who were taking up most of the spots in the top 10 lists (Webb, CC, Dempster, Sheets, Peavy, and Hamels.)  Since the Cy Young ballot has three spaces, I need to pick a winner of the third place vote.  Remember how we were only considering C.C. Sabathia’s NL stats here.  What if we allowed him to have the benefit of the days when he was in Cleveland?  Suddenly, he outclasses the rest of that pack in WPA, WPA/LI, and K/BB, but is below all of them in VORP.  Sounds like he wins the third place vote. 
Then it becomes a free-for-all.  None of these guys are really involved in the K/BB sweepstakes, so I tried doing a bit of a round-robin type tournament to see who-beats-whom more consistently.  I ended up with a scenario where Dempster is ahead of Sheets in two categories, Sheets ahead of Hamels in two categories, and Hamels ahead of Dempster in two categories.  But all three beat Peavy in two out of three categories, and then everyone (including Peavy) goes 2-out-of-3 on Webb.  To break the three way tie between Dempster, Sheets, and Hamels, let’s go to FIP.  Right now, Sheets leads Dempster leads Hamels.  Seeing that it’s a fourth place vote at stake and there’s only three spots… does it matter?
So to recap the NL ballot, Lincecum, Santana, CC, then SheetsDempsterHamels, Peavy, Webb, Haren, Kuo.
Congratulations to StatSpeak Cy Young winners Cliff Lee and Tim Lincecum, and for what it’s worth to some very worthy runners up in Roy Halladay and Johan Santana.  The trophy is in the mail.