Getting Pelfrey-ized

Things have been working out very nicely for the New York Mets over the last three months.  After salvaging a win last night against the Phillies, the Jerry Manuel gang has turned a seven and a half game deficit in early June into a two game lead in the NL East.  Realistically, the only way the Phillies or Mets make the playoffs is by virtue of winning their division, and the Mets are currently in the driver’s seat.  One of the biggest reasons for their resurgence is the firing of Willy Randolph the performance of Mike Pelfrey.  The 6’7″ righty started the season poorly, but, since May 26, has been arguably one of the best in the bigs.
In 28 starts, Pelfrey has toed the rubber in 176.2 innings, surrendering 180 hits, just 10 of which have left the yard.  The 24-year old Pelfrey has walked 55 and fanned 101, producing a K/BB of 1.84.  While he only had about a half-season’s worth of starts prior to this season, Pelfrey appeared to be capable of a 5.4-5.7 K/9, but his control was an issue, walking plenty of batters.  In fact, his current 1.84 K/BB might be below average, but is actually a career-high, regardless of how short that career may be.  The major reason is his reduced walk rate.
With a 5.15 K/9, Pelfrey has been able to control his repertoire to reduce his BB/9 to 2.80.  All told, his 3.62 ERA is supported by a 3.81 FIP, as his controllable skills have been solid.  A 1.84 K/BB could be better, so it’s safe to say the low FIP is due primarily to his very low home run rate.  A HR/FB of just 6% should regress, so Pelfrey might not be this adept at keeping balls in the park, but he is young and could conceivably continue improving in the walks and strikeouts departments to counteract this likely inevitable regression.
Other than the HR/FB red flag, however, his performance has been talent-driven.  His BABIP is .300 on the dot and his strand rate is a slightly above average 74%.
Pelfrey is a groundball pitcher.  Despite relatively few starts prior to this season, rates of balls in play tend to stabilize very quickly, and each year he has pitched has resulted in an LD/GB/FB of about 20%/49%/31%.  According to Jessica Bader of Take the 7 Train, while discussing purely scouting, confidence has been the key to Pelfrey’s turn-around.  Prior to perhaps May 31 of this year, he looked timid and unsure on the mound.  After his first nine starts this year, there was talk that Pedro Martinez’s impending return from the DL would result in Pelfrey being demoted.  Even after a solid 7-inning performance against the Marlins on May 31, his future was uncertain.
The Mets opted to stick with him, which proved to be a turning point in his season.  In those first nine starts, he posted a 4.41 BB/9, a 4.22 K/9, and a 5.33 ERA.  Since then, a 2.19 BB/9, a 5.50 K/9, and a 2.96 ERA.  He has also used his fastball more often in recent starts, upping his seasonal usage to right around 80%, which is a very, very high percentage for a starting pitcher.  Coming in at 93 mph with solid movement, however, Pelfrey has made it work, letting his natural sink hit the desired spots as opposed to pressing, aiming in an attempt to perfectly hit a spot.
The various projection systems did not peg Pelfrey for much playing time this season, but Dan Szymborski’s ZiPS system had him at exactly 28 starts, the amount he has currently made.  Here is a comparison of ZiPS vs. Actual Performance:

  • ZiPS: 150.0 IP, 163 H, 16 HR, 60 BB, 90 K, 4.86 ERA, 4.59 FIP
  • ACT: 176.2 IP, 180 H, 10 HR, 55 BB, 101 K, 3.62 ERA, 3.81 FIP

By all accounts, he has exceeded expectations, but we cannot say with 100% certainty that he is a true 3.60 ERA/3.80 FIP pitcher.  His true talent level has definitely changed, but we did not know much about what to expect entering this season.  It is not very likely he will sustain a 6% HR/FB, with the league average being around 11-12%, which will result in, well, more runs, a higher ERA, and a higher FIP.  The best way to counteract that would be, as previously mentioned, show improvements in both the walks and strikeouts departments.
If he can continue to throw 50% groundballs with a 93 mph fastball, fan a few more hitters, and reduce his walk rate a bit, any home run regression will not negatively effect his controllable skills or barometers too drastically.  He might not be a perennial 3.50 ERA pitcher, but he seems to be much better than a 4.80-5.20 ERA, back of the rotation hurler.  At the very least, Mets fans should be happy that this isn’t another Alay Soler situation.

Data Dump: 2007 OPA! numbers

For those of you who followed the development of my OPA! system for measuring fielding, and were curious about the numbers, here are the 2007 numbers in their full glory.  I’ve posted them as Google docs, and sorted them by the total number of runs saved above average.  I couldn’t get the pitcher file to fit into a Google Doc (if someone wants it, e-mail me), and OPA! doesn’t look at catchers.  But the other seven positions are up.  Hopefully, on the spreadsheets, the headings make sense.  Players are labeled with their Retrosheet ID.  As always, feel free to use them as you see fit, just maybe tip the cap over this way.
When the 2008 Retrosheet file comes out, I’ll post the 2008 numbers. 

What run estimator would Batman use? (Part II)

If you haven’t already, I suggest you read Part I first, but it’s not strictly necessary, so long as you have a feel for how run estimators work. Part I goes into a lot of the background of how run estimators work, but there’s not a lot of technical detail.

Now, let’s go ahead and strap some run estimators down to the table, cut them open and see how they work.

Linear weights

First of all, when I refer to linear weights, I should clarify that I use the term to refer to any linear run estimator, not just Pete Palmer’s Linear Weights System. Onward, then.

Simply looking at a linear weights formula should be pretty straightforward. We’ll look at the reduced version of Extrapolated Runs, Jim Furtado’s version of a linear weights formula*:

(.50 * 1B) + (.72 * 2B) + (1.04 * 3B) + (1.44 * HR) + (.33 * (HP+TBB)) + (.18 * SB) + (-.32 * CS) + ((-.098 * (AB – H))

Essentially, every event is multiplied by its average run value, based on a certain run context. (In the case of XR it’s team seasons from 1995 to 1997, but you could use any context you wanted. You could put together a linear weights formula for, say, Greg Maddux’s career if you wanted to.)

This begs the question of how to determine the run value of an event. Looking simply at Runs Batted In won’t help – a single with the bases empty provides value. So what do we do? Here’s where a concept called run expectancy comes in handy. Every base/out state has a certain run expectancy, which essentially is how many runs on average a team scores from that point of the inning. I’m using values from this table by Tango, because they’re already in a nice arrangement.


There’s one case not strictly defined on the table; three outs means a run expectancy of zero.

The linear weights value of an event is the average change in run expectancy by an event. Let’s say you have runners on first and second, no outs; that’s a RE of 1.573. A player hits a double, scoring the two runners in front of him:

2 + 1.189 = 3.189

The double scored two runs and leaves the game with an RE of 1.189, for a total RE of 3.189. Subtract 1.573, and you get 1.616, the run contribution of that double. Take the average RE change of every double available in your dataset, and there’s your linear weights value of a double.

(There are other ways to estimate linear weight values when you don’t have sufficient data to do the Run Expectancy analysis; an overview of the subject is available.)

Read more of this post

Holds, Saves and Blown Saves

Francisco Rodriguez of the Angels, with 54 saves in 59 opportunities, is on his way to breaking the all-time single season record of 57, set by Bobby Thigpen of the White Sox in 1990. Percentage wise, the Phillies’ Brad Lidge is perfect, with 33 saves in 33 opportunities. On the opposite end, there are records such as those of Aaron Heilman of the Mets, 3 for 7 this year and 9 for 33 since 2004. It’s obvious Heilman can’t close games, with a record like that. No wonder Willie Randolph got fired. Right? Wrong!
Saves have become a statistic who’s leaders are as well known to the casual fan as the homerun leaders, and save percentage is one of the simplest computations in baseball statistics, but it has always contained an error that grossly distorts the value of middle relievers to the general public. It is easy to understand that the setup man isn’t in a position to get many saves, but save percentage has been held up by many, including the media, as evidence that certain pitchers routinely fail when handed a save situation, proof that they can’t handle the closer role. Read more of this post

World Famous StatSpeak Roundtable: September 3

Our humble round table welcomes a new guest knight.  Please welcome to this week’s version of the roundtable, Will Carroll of Baseball ProspectusWill has been kind enough to join us here on StatSpeak for a record-setting five-person roundtable.  He joins us in a discussion of the ghosts of trade deadline deals past, injuries and Sabermetrics, C.C.’s sorta no-hitter, instant replay, and who will be looking in from the outside on the AL playoffs in October.
Question #1: When I started doing ”Under the Knife” seven years ago, there were no stats and people didn’t think that injuries and sabermetrics went together. I’m still not sure they do, but to me, it’s about information. You guys are stats guys — how would you go about mixing the two?
Will Carroll: I think it comes down to a bit of luck. Is it someone getting hot and carrying the team? Is it an injury that costs them a premier player for a couple weeks or worse? I know that luck is probably the worst thing to say on a site like this but I think its the best way to say that small things make a huge difference and I’m not sure which ones. I think we get lost in this fog because we’re seeing quantifiable effects but in such small quantities that we don’t notice, things that amount to 0.1 runs or less, but enough of them that they add up.
Brian Cartwright: Well, my day job is in data processing, which include designing methods of data collection. So one of my current projects is designing a comprehensive database that hopefully will include everything we can get our hands on, from season stats and play by play to transactions and injuries, as opposed to narrowly constructed ad hoc databases. I’d like to be able to look at the pre injury data and see if there are any indicators, such as simple to derive stuff like lists of pitchers headed to a Verducci Effect (and then test how true it is). Post injury, be able to see how well players recover from various types of injuries.
I know Will has done much of this on his own, but I’d like to see the injury data married to the stats and projections to enable more of us to do these kind of studies.
Colin Wyers: That’s sort of the unexplored frontier of sabermetrics – introducing traditional sorts of data into our models. What’s lacking right now is a good record of who got injured, where, and how. I don’t know if we’ll ever get to that point, but people like Tom Ruane of Retrosheet are working on that sort of data – and all of us who research baseball owe the folks of Retrosheet a huge debt.
Eric Seidman: A fusion of injuries and sabermetrics is something I have actually discussed with Will on numerous occasions because now, with Pitch F/X data in full bloom, there are certain avenues we can explore.  For instance, one idea of Will’s (that I wholeheartedly support) is that pitchers that are on the verge of injury will have consistent release points with inconsistent results.  Before, this really could not be studied, but now it can.  We can run analyses to see which pitchers fit the bill.  Or, if someone is experiencing a “dead arm” we can look to their movement.  Stats cannot tell us everything about injuries, but just like all other aspects of analysis, the combo of numbers and scouting will ultimately prove to be key in this combination.
Pizza Cutter: I don’t think that the two are opposed at all.  I do agree that injury analysis isn’t really something that fits nicely into any of the Sabermetric models that we have now, but that’s more of an engineering problem.  To really pursue this line of study, one would have to be familiar with bio-mechanics and statistics, plus have a fairly extensive injury database handy.  (So basically, you, Will.)  Even at that point, there’s going to be a lot of statistical noise.  Suppose that Larry has an elbow problem and goes on the 15 day DL.  Even if we assume that we know exactly when he was hurt (and when it started hurting his performance), we’ll never really know how hurt he was.  How can we tell if it’s not just him having a bad string of luck?  Maybe with a big enough sample, we can detect a signal, but it’s going to be hard to find.  Calculating the complete absence of a player is fairly easy.  Calculating what it means to have a player at 80% is a lot harder.
The other side of the Sabermetric-injury nexus is predicting who’s an injury risk.  My guess is that some team (or several) out there hired an actuary to study just that and they’re keeping it close to the vest.  (Can’t blame them.)  Plus, with many teams already insuring contracts, someone out there in the insurance industry must be running some sort of tables.
Read more of this post

Surprise! Kelly Johnson has gotten better this year

Recently, there was a note at the ever-excellent MLB Trade Rumors which said that the Atlanta Braves were likely looking to shop second baseman Kelly Johnson in the off-season.  The post noted that Johnson’s offensive production had declined this year, and the Braves do have fashion designer Martin Prado ready to play second next year.  I don’t mind the thought that the Braves might think Prado the better option.  He strikes out much less than does Johnson, although Prado seems to have a bit less power.  The part that I object to is the thought that Kelly Johnson is actually “losing it” this year.
Certainly, Kelly Johnson’s performance has suffered.  Last year, his slash line of .276/.375/.457 was rather nice for a second sacker.  This year, Johnson has slipped a little with a slash line around .260/.335/.400.  Not bad, but not what Braves fans were hoping for.  So Johnson must be losing his mojo, right?  Not necessarily.  In fact, I’d say that Johnson has actually gotten better this year.  How does a player drop 80-90 points worth of OPS and become better?  Read on.
First, let’s look at Johnson’s swing and plate discipline profile.  What’s important to know is that things involving plate discipline and swinging are the least given to variation over time.  It makes sense, because players are the ones who decide whether or not to swing the bat.  Hitting a home run requires cooperation of the pitcher, ball, and occasionally, wind.  This year, Johnson, a man with a strikeout problem, and a rather pedestrian contact percentage (around 80-81%, which is around the league median) actually started swinging more.  And that’s a good thing.  In 2007, on my twin measures of plate discipline, Johnson had a response bias rating of 0.84.  Now, response bias is a measure of how likely a player is to swing.  The ideal number is 1.00, because it minimizes the number of strikes that a player piles up, given whatever abilities he has on the other measure, sensitivity.  A number over 1.00 means that a player is swinging too much.  Under 1.00 means the player is swinging too little.  In 2007, Johnson’s problem is that he was taking too many pitches.  Johnson took a step toward fixing that.
My measure suggested that Johnson would benefit from swinging more, and he has done so.  Last year, Johnson swung at 39.3% of pitches.  This year, he’s been up around 45%.  (Maybe he reads StatSpeak?)  His strikeout rate has dropped (although only about a percentage point) in response.  Swinging more also drove down his walk total, but it meant that he was putting more balls into play.  So, let’s look there.
In general, a batter has pretty good control over what type of batted ball he puts into play.  The rates at which a batter hits grounders, flyballs,  line drives, or popups has pretty good reliability, so changes in them are generally not random in nature, but a change in either talent level or approach.  What happens to those batted balls is another matter.  More on that in a minute.  This year, Johnson’s LD/GB/FB profile went from 18.8%/42.7%/38.5% last year to something around 23%/38%/39%.  His flyballs are staying steady, but he’s turning some of his ground balls into line drives.  That’s good, because a line drive (which doesn’t leave the yard)  has about a 73% chance of going for a base hit, while a grounder has a 24% chance.  Line drives are good.
The fine folks over at FanGraphs are fond of using xBABIP for hitters.  Given a batter’s batted ball profile, we can get some sort of idea of what we might expect his BABIP to be (hence xBABIP).  The formula that FanGraphs uses is .15 * FB% + .24 * GB% + .73 * LD%.  Last year, Kelly Johnson’s xBABIP was around .290.  His actual BABIP was .330.  Johnson did 40 points better than expected given his batted ball profile.
The next question is whether that ability to “outhit” the expectation is something that is luck or skill.  As is my custom, I took four years worth of data (2004-2007) and calculated the xBABIP and the actual BABIP for all players, and found the difference between the two (whether they over- or under-performed).  It’s possible that some players just hit line drives or ground balls that are harder to catch than others.  If that’s the case, then we should see consistency over those four years in which players over-perform and which ones under-perform.  To test this, I used my favorite devide, the intra-class correlation (shot!).  The result was an ICC of .27 or .28, depending on how much I restricted the sample by the minimum number of PA required.
That means that there is a little bit of skill involved in over- or under-performing one’s xBABIP, although there’s a good deal more luck in there than one might expect.  Looking at it from an R-squared perspective, it’s more than 90% luck (or more properly, unexplained).  It’s not quite the level of non-correlation found in BABIP for pitchers, but it’s closer to that area than to the “three true outcome” neighborhood.  Perhaps it’s time for DIBS.
Going back to Johnson, it means that it’s likely that most of Johnson’s over-performance in the BABIP area was due to chance.  I haven’t run the numbers, but I’m guessing that expected BABIP is going to be a better predictor of future results than is actual BABIP.   Now, in 2007, Johnson’s expected BABIP was .290.  This year, it’s around .315 (more line drives!), which is what his actual performance has been.  All performance is talent plus luck.  So in reality, Johnson’s numbers from last year, which were fueled mostly by that high BABIP was mostly a matter of luck.  This year, he hasn’t had good or bad luck, but the underlying talent seems to have improved.  Atlanta’s management might be confusing luck with skill.
The one concerning piece about Johnson’s statline is the drop in HR/FB.  HR/FB is a statistic that is mostly in the batter’s control, and his drop from 10.3% to around 7% is a little concerning.  His flyball percentage hasn’t changed much from last year… they’re just not leaving the park as much, so perhaps there’s a power outage in there somewhere. 
With that said, Johnson isn’t exactly a world-beater.  Now that his luck has stablized, we’re getting a pretty good idea of what he’s really capable of.  According to VORP he’s in the bottom half of “regular” second basemen in all of baseball among such luminaries of Joe Inglett, Clint Barmes, and Mark Grooz, Grudsil, oh, you know who I’m talking about.  He strikes out way to much for a guy who doesn’t put up massive HR numbers.  My OPA! fielding system has him rated as a boring old average second baseman in the field.  So, while I can’t fault the Braves if they think they have a better option, I’d caution them to be a little more careful in how they make that decision.  Kelly Johnson is a symptom of a much bigger problem of the need to understand the separation between talent and performance.  He’s actually gotten better this year, despite what it looks like.

K/BB, K/9-BB/9, and Save Situations

As I watched Brad Lidge shut down the Cubs in the ninth innings on both Saturday and Sunday, it dawned on me why my confidence level soars through the roof every time he enters a game in a save situation.  It isn’t solely because he is yet to blow a save, or because he throws with high velocity, but rather because he does not walk many batters and he strikes out a ton of them.  To me, this results in a very easygoing ninth inning experience, unlike the ones I experienced with Tom Gordon over the last couple of years.
Other fans may desire different attributes amongst their closers, but I just want the least amount of stress possible.  In theory, as long as we assume that balls in play are random, a closer that does not walk many hitters and strikes out plenty would fit this least-stress bill.  No, nobody has a 0.00 BB/9 and 27.00 K/9 but within reasonable terms the low BB/9 and high K/9 is what would interest me.  In evaluating closers during save situations in my database, and after hearing the thoughts of some fans with regards to the K/BB ratio, I wondered if it would make a difference if, instead of dividing strikeouts and walks, we subtracted?
Keep in mind that I am in no way endorsing either as the gold standard but rather wondering if it makes a difference.  One opinion I read on a forum, which was seemingly derived from one of Ron Shandler’s fantasy plans, stipulated that subtracting the BB/9 from K/9 would give a better view since the K/BB ratios can come in all shapes and sizes–a 2.50 K/BB with a 5 K/9 and a 2 BB/9 is different than one with a 9.00 K/9 and a 4.00 BB/9.  Another thought was simply to use the K/BB but set a minimum for the K/9; as in, consider any K/BB above 2.50 to be very solid as long as the K/9 exceeded 6.0.
Normally, I would jump at an opinion like this and call it poppycock because a starter can be successful with under 6 K/9 if his K/BB is great; the 6.0 minimum is essentially artificial.  However, when looking at which closer offers the least stressful outings in terms of controllable skills, might there be some credence to looking at K/BB in a different light?  John Rocker, in 1999, in save situations, struck out 15.70 batters per nine while walking 4.97.  His K/BB was 3.16, good, but not tremendous given his ridiculous K/9.  In 1994, Robb Nen had a 10.43 K/9 and a 1.74 BB/9 during save situations, a 5.99 K/BB. 
The K/BB would lead us to believe that Nen was much more effective, but the K/9-BB/9 would suggest that a case could be made for Rocker.  The advantage he has in K/9 is greater than Nen’s BB/9 advantage.  Again, this isn’t something I would support in any instance other than perhaps evaluating closers in save situations.  Subtracting the BB/9 from K/9 would give us an idea of how both metrics are related to one another while also accounting for the fact that striking out batters as a closer is, well, a really good trait.  Looking solely at save situations, here are the top ten seasons in K-BB, from 1980-2007, with a minimum of 15 saves:

  1. Eric Gagne, 2003: 13.89 (15.47-1.58)
  2. Billy Wagner, 1999: 12.60 (15.36-2.76)
  3. Brad Lidge, 2004: 12.22 (14.92-2.70)
  4. Takashi Saito, 2006: 12.00 (14.00-2.00)
  5. Joe Nathan, 2006: 11.39 (13.40-2.01)
  6. Tom Henke, 1989: 11.36 (12.49-1.13)
  7. Troy Percival, 1997: 11.17 (14.59-3.41)
  8. Billy Wagner, 1998: 11.15 (14.78-3.63)
  9. Eric Gagne, 2002: 10.90 (12.54-1.64)
  10. Robb Nen, 2000: 10.84 (14.11-3.27)

There were actually 15 seasons in which a closer met the aforementioned criteria and posted a negative K-BB, meaning that they walked more than they fanned.  Based on this, the worst and most stressful closed season in this span belongs to the 1984 season of Doug Sisk, with his -2.34 mark.  Curious to see whether or not this stat made a difference, I ran correlations between K-BB and both ERA and OPS against, and K/BB with the same two comparative metrics.  The results:

  • K-BB: -0.28 to ERA, -0.40 to OPS against
  • K/BB: -0.29 to ERA, -0.37 to OPS against

Essentially, this tells us that it really does not make a difference if we use K/BB or K/9-BB/9.  Regardless of whether or not the K/BB ratio accounts for the fact that K/9 and BB/9 come in all shapes and sizes, their relationships with ERA and OPS against are virtually the same.  If setting a minimum K/9 for your starter helps in a fantasy league, go ahead, but in evaluating players in a non-fantasy setting, the minimum is artifical and K/9-BB/9 makes no difference.

What run estimator would Batman use? (Part I)

[A note from the author: This study ended up becoming more involved than initially suspected, mostly because the author is bad at estimating such things. As such, this is the first part of the piece, which will eventually be published in two or three parts, depending. This part isn't very technical, and largely concerns itself with the theory behind run estimation. I state this up front so that you don't get 2,000 words into the document only to be disappointed that not a single run estimator has been evaluated at all.]

This isn’t the first study on run estimator accuracy, and I don’t promise it will be the most thorough. But I’ve been skirting around the issue in my previous work here, and so I figured it was time to finally get around to doing it proper, so that I can just have something to conveniently reference every time it comes up in the future.

Most previous studies of accuracy have concerned themselves with accuracy at the team level, using seasonal totals. This makes sense for a lot of reasons – run scoring is a team process, and team level run scoring data is readily available for entire seasons. Here’s the rub, though – estimating runs at the seasonal team level isn’t that hard. Here’s a look at the distribution of team runs per game, 1954-2007:



Notice how everything bunches up in the center? That’s because there isn’t a vast difference in run scoring totals between teams over the course of an entire season. That’s how you can explain the sterling accuracy of my latest run predictor, using runs per game:

Avg. Error

Okay, so it’s not even as good as, say, batting average at predicting team run scoring. But it’s pretty decent, considering I just assumed every team was league average.

Read more of this post




Get every new post delivered to your Inbox.